talk-data.com talk-data.com

Topic

Modern Data Stack

65

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

65 activities · Newest first

Sponsored by: Promethium | Delivering Self-Service Data for AI Scale on Databricks

AI initiatives often stall when data teams can’t keep up with business demand for ad hoc, self-service data. Whether it’s AI agents, BI tools, or business users—everyone needs data immediately, but the pipeline-centric modern data stack is not built for this scale of agility. Promethium enables the data teams to generate instant, contextual data products called Data Answers based on rapid, exploratory questions from the business. Data Answers empower data teams for AI-scale collaboration with the business. We will demo Promethium’s new agent capability to build data answers on Databricks for self-service data. The Promethium agent leverages and extends Genie with context from other enterprise data and applications to ensure accuracy and relevance.

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Despite the proliferation of cloud data warehousing, BI tools, and AI, spreadsheets are still the most ubiquitous data tool. Business teams in finance, operations, sales, and marketing often need to analyze data in the cloud data warehouse but don't know SQL and don't want to learn BI tools. AI tools offer a new paradigm but still haven't broadly replaced the spreadsheet. With new AI tools and legacy BI tools providing business teams access to data inside Databricks, security and governance are put at risk. In this session, Row Zero CEO, Breck Fresen, will share examples and strategies data teams are using to support secure spreadsheet analysis at Fortune 500 companies and the future of spreadsheets in the world of AI. Breck is a former Principal Engineer from AWS S3 and was part of the team that wrote the S3 file system. He is an expert in storage, data infrastructure, cloud computing, and spreadsheets.

Unlocking Data Intelligence: A Beginner’s Guide to Unity Catalog

Getting started with data and AI governance in the modern data stack? Unity Catalog is your gateway to secure, discoverable and well-governed data and AI assets. In this session, we’ll break down what Unity Catalog is, why it matters and how it simplifies access control, lineage, discovery, auditing, business semantics and secure, open collaboration — all from a single place. We’ll explore how it enables open interoperability across formats, tools and platforms, helping you avoid lock-in and build on open standards. Most importantly, you’ll learn how Unity Catalog lays the foundation for data intelligence — by unifying governance across data and AI, enabling AI tuned to your business. It helps build a deep understanding of your data and delivers contextual, domain-specific insights that boost productivity for both technical and business users across any workload.

Sonal Goyal: Open Source Entity Resolution - Needs and Challenges

🌟 Session Overview 🌟

Session Name: Open Source Entity Resolution - Needs and Challenges Speaker: Sonal Goyal Session Description: Real world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields, which makes it hard to combine them together, especially with growing data volumes. This hurts customer analytics - establishing lifetime value, loyalty programs, or marketing channels is impossible when the base data is not linked. No AI algorithm for segmentation can produce the right results when there are multiple copies of the same customer lurking in the data. No warehouse can live up to its promise if the dimension tables have duplicates.

With a modern data stack and DataOps, we have established patterns for E and L in ELT for building data warehouses, datalakes and deltalakes. However, the T - getting data ready for analytics still needs a lot of effort. Modern tools like dbt are actively and successfully addressing this. What is also needed is a quick and scalable way to resolve entities to build the single source of truth of core business entities post Extraction and pre or post Loading.

This session would cover the problem of Entity Resolution, its practical applications and challenges in building an entity resolution system. It will also cover Zingg - an Open Source Framework for building Entity Resolution systems. (https://github.com/zinggAI/zingg/) 🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Think Inside the Box: Constraints Drive Data Warehousing Innovation

As a Head of Data or a one-person data team, keeping the lights on for the business while running all things data-related as efficiently as possible is no small feat. This talk will focus on tactics and strategies to manage within and around constraints, including monetary costs, time and resources, and data volumes.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Why Small Data?: https://benn.substack.com/p/is-excel-... Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Blog: https://motherduck.com/blog/


Learn how your data team can drive innovation and maximize ROI by embracing constraints, drawing inspiration from SpaceX's revolutionary cost-effective approach. This video challenges the "abundance mindset" prevalent in the modern data stack, where easily scalable cloud data warehouses and a surplus of tools often lead to unmanageable data models and underutilized dashboards. We explore a focused data strategy for extracting maximum value from small data, shifting the paradigm from "more data" to more impact.

To maximize value, data teams must move beyond being order-takers and practice strategic stakeholder management. Discover how to use frameworks like the stakeholder engagement matrix to prioritize high-impact business leaders and align your work with core business goals. This involves speaking the language of business growth models, not technical jargon about data pipelines or orchestration, ensuring your data engineering efforts resonate with key decision-makers and directly contribute to revenue-generating activities.

Embracing constraints is key to innovation and effective data project management. We introduce the Iron Triangle—a fundamental engineering concept balancing scope, cost, and time—as a powerful tool for planning data projects and having transparent conversations with the business. By treating constraints not as limitations but as opportunities, data engineers and analysts can deliver higher-quality data products without succumbing to scope creep or uncontrolled costs.

A critical component of this strategy is understanding the Total Cost of Ownership (TCO), which goes far beyond initial compute costs to include ongoing maintenance, downtime, and the risk of vendor pricing changes. Learn how modern, efficient tools like DuckDB and MotherDuck are designed for cost containment from the ground up, enabling teams to build scalable, cost-effective data platforms. By making the true cost of data requests visible, you can foster accountability and make smarter architectural choices. Ultimately, this guide provides a blueprint for resisting data stack bloat and turning cost and constraints into your greatest assets for innovation.

Is BI Too Big for Small Data?

This is a talk about how we thought we had Big Data, and we built everything planning for Big Data, but then it turns out we didn't have Big Data, and while that's nice and fun and seems more chill, it's actually ruining everything, and I am here asking you to please help us figure out what we are supposed to do now.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Is Excel Immortal?: https://benn.substack.com/p/is-excel-immortal Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Blog: https://motherduck.com/blog/


Mode founder David Wheeler challenges the data industry's obsession with "big data," arguing that most companies are actually working with "small data," and our tools are failing us. This talk deconstructs the common sales narrative for BI tools, exposing why the promise of finding game-changing insights through data exploration often falls flat. If you've ever built dashboards nobody uses or wondered why your analytics platform doesn't deliver on its promises, this is a must-watch reality check on the modern data stack.

We explore the standard BI demo, where an analyst uncovers a critical insight by drilling into event data. This story sells tools like Tableau and Power BI, but it rarely reflects reality, leading to a "revolving door of BI" as companies swap tools every few years. Discover why the narrative of the intrepid analyst finding a needle in the haystack only works in movies and how this disconnect creates a cycle of failed data initiatives and unused "trashboards."

The presentation traces our belief that "data is the new oil" back to the early 2010s, with examples from Target's predictive analytics and Facebook's growth hacking. However, these successes were built on truly massive datasets. For most businesses, analyzing small data results in noisy charts that offer vague "directional vibes" rather than clear, actionable insights. We contrast the promise of big data analytics with the practical challenges of small data interpretation.

Finally, learn actionable strategies for extracting real value from the data you actually have. We argue that BI tools should shift focus from data exploration to data interpretation, helping users understand what their charts actually mean. Learn why "doing things that don't scale," like manually analyzing individual customer journeys, can be more effective than complex models for small datasets. This talk offers a new perspective for data scientists, analysts, and developers looking for better data analysis techniques beyond the big data hype.

Coalesce 2024: How Riot Games is building player-first gaming experiences with Databricks and dbt

Riot Games, creator of hit titles like League of Legends and Valorant, is building an ultimate gaming experience by using data and AI to deliver the most optimal player journeys. In this session, you'll learn how Riot's data platform team paired with analytics engineering, machine learning, and insights teams to integrate Databricks Data Intelligence Platform and dbt Cloud to significantly mature its data capabilities. The outcome: a scalable, collaborative analytics environment that serves millions of players worldwide.

You’ll hear how Riot Games: - Centralized petabytes of game telemetry on Databricks for fast processing and analytics - Modernized their data platform by integrating dbt Cloud, unlocking governance for modular, version-controlled data transformations and testing for a diverse set of user personas - Uses Generative AI to automate the enforcement of good documentation and quality code and plans to use Databricks AI to further speed up its ability to unlock the value of data - Deployed machine learning models for personalized recommendations and player behavior analysis

You'll come away with practical insights on architecting a modern data stack that can handle massive scale while empowering teams across the organization. Whether you're in gaming or any data-intensive industry, you'll learn valuable lessons from Riot's journey to build a world-class data platform.

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: Has the modern data stack failed the business?

The modern data stack has improved the lives of data teams everywhere. But has it helped the rest of the business? In this talk, we’ll discuss the business teams’ perspective. Are they actually getting value from the modern data stack? How does help it them do their jobs better? And why do data teams keep questioning if we’re “adding value” with our powerful new tools? Attendees will gain perspective on their data ‘customers’ and learn ideas on how to deliver tangible business value.

Speaker: Paul Blankley CTO Zenlytic

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Enterprise MDS deployment at scale: dbt & DevOps - Coalesce 2023

Behind any good DataOps within a Modern Data Stack (MDS) architecture is a solid DevOps design! This is particularly pressing when building an MDS solution at scale, as reliability, quality and availability of data requires a very high degree of process automation while remaining fast, agile and resilient to change when addressing business needs.

While DevOps in Data Engineering is nothing new - for a broad-spectrum solution that includes data warehouse, BI, etc seemed either a bit out of reach due to overall complexity and cost - or simply overlooked due to perceived issues around scaling often attributed to the challenges of automation in CI/CD processes. However, this has been fast changing with tools such as dbt having super cool features which allow a very high degree of autonomy in the CI/CD processes with relative ease, with flexible and cutting edge features around pre-commits, Slim CI, etc.

In this session, Datatonic covers the challenges around building and deploying enterprise-grade MDS solutions for analytics at scale and how they have used dbt to address those - especially around near-complete autonomy to the CI/CD processes!

Speaker: Ash Sultan, Lead Data Architect, Datatonic

Register for Coalesce at https://coalesce.getdbt.com

The Hitchhikers Guide to building a modern data stack - Coalesce 2023

There’s a lot of decisions that need to be made throughout the initial deployment, migration to and scaling of the data stack. The decisions can be big or small, hard to reverse or trivial. When looking back at Zip’s modernization journey, there’s some things that the team nailed and others that they’d definitely do differently if they were to do it again. In this talk, the Zip team goes through the principles they used to make decisions when building out their stack, the challenges that they faced, the solutions that they landed on, and their learnings.

Speaker: Moss Pauly, Senior Manager Data Products, Zip

Register for Coalesce at https://coalesce.getdbt.com

Panel discussion: Fixing the data eng lifecycle - Coalesce 2023

As Joe Reis recently opined, if you want to know what’s next in data engineering, just look at the software engineer. The MDS-in-a-box pattern has been a game changer for applying software engineering principles to local data development– improving the ability to share data, collaborate on modeling work and data analysis the same way we build and share open source tooling.

This panel brings together experts in data engineering, data analytics and software engineering to explore the current state of the pattern, pieces that remain missing today and how emerging tools and data engineering testing capabilities can refine the transition from local development to production workflows.

Speakers: Matt Housley, CTO, Halfpipe Systems; Mehdi Ouazza, Developer Advocate, MotherDuck; Sung Won Chung, Solutions Engineer, Datafold; Louise de Leyritz, Host, The Data Couch podcast

Register for Coalesce at https://coalesce.getdbt.com

Embracing a modern data stack in the water industry - Coalesce 2023

Learn about Watercare's journey in implementing a modern data stack with a focus on self serving analytics in the water industry. The session covers the reasons behind Watercare's decision to implement a modern data stack, the problem of data conformity, and the tools they used to accelerate their data modeling process. Diego also discusses the benefits of using dbt, Snowflake, and Azure DevOps in data modeling. There is also a parallel drawn between analytics and Diego’s connection with jazz music.

Speaker: Diego Morales, Civil Industrial Engineer, Watercare

Register for Coalesce at https://coalesce.getdbt.com

My (almost) musical career and RMIT’s journey adopting dbt - Coalesce 2023

In this presentation, Sarah and Darren discuss RMIT University's journey to implementing the modern data stack with dbt. They bring tales of their musical successes and misadventures, lessons learned with both music and data engineering, and how these seemingly disparate worlds overlap.

Speakers: Darren Ware, Senior Data Engineer, RMIT University; Sarah Taylor, Lead Data Engineer, RMIT University

Register for Coalesce at https://coalesce.getdbt.com

Revolutionizing an archaic industry via the modern data stack - Coalesce 2023

Florence is in the business of healthcare staffing. It is an incredibly outdated industry but with the help of the modern data stack, Florence has been able to make people's lives easier. Here is an overview on the challenges and tactics on how the team at Florence overcame them.

Speakers: Monica Youn, Chief Analytics Officer, Florence; Daniel Ferguson, Data Engineer, Florence

Register for Coalesce at https://coalesce.getdbt.com

Game on: Building massive multiplayer online data products - Coalesce 2023

With dbt and the modern data stack, onboarding and surfacing data has never been more manageable for analytics engineers. Still, 85% of data products never make it into production. Why do data practitioners struggle to create data products that engage the people who actually use them?

Join Jake and Nate as they discuss how integration and collaboration can drive user engagement, ultimately leading to increased adoption. They’ll present guiding principles and three real world examples of data products designed to center and empower business users to: - Replace the need for seeds using materialized Sigma input tables - Create and manage HubSpot contacts and segments using Sigma’s HubSpot template - Sync insights from Sigma directly to downstream systems using the Hightouch integration.

The lessons learned from building a data platform at Sigma will provide everyone a framework for fostering collaboration between data teams and business users so they can raid insights, move up the leaderboard, and level up their gameplay no matter what data stack they use.

Speakers: Nate Meinzer, Director of Partner Engineering, Sigma Computing; Jake Hannan, Senior Manager, Data Platform, Sigma Computing

Register for Coalesce at https://coalesce.getdbt.com

Warehouse-first data strategy at ClickUp - Coalesce 2023

During the data team's short tenure (2.5 years) at ClickUp, they have built and scaled a fully modern data stack and implemented a warehouse-first data strategy. ClickUp's data is comprised of thousands of dbt models and upstream/downstream integrations with nearly every software at ClickUp. ClickUp uses dbt Cloud and Snowflake to power dozens of downstream systems with audience creation, marketing optimization, predictive customer lifecycle ML, a PLG/PLS motion, and much more. This session covers the foundational principles ClickUp follows and how warehouse-first thinking has unlocked tremendous value for ClickUp.

Speakers: Marc Stone, Head of Data, ClickUp

Register for Coalesce at https://coalesce.getdbt.com

10x-ing developer experience with Databricks, Delta, and dbt Cloud - Coalesce 2023

In this session, gain strategic guidance on how to deploy dbt Cloud seamlessly to a team of 5-85 people. You'll learn best practices across development and automation that will ensure stability and high standards as you scale the number of developers using dbt Cloud and the number of models built up to the low thousands.

This session is a great fit for folks with beginner through intermediate levels of experience with dbt. In basketball terms, this talk covers mid-range shooting skills, but does not go into detail about 3-pointers, let alone half court shots. Likewise, this talk is not for people who are brand new to dbt and aren't familiar with the basic architecture of dbt and the modern data stack.

Speakers: Chris Davis, Senior Staff Engineer, Udemy, Inc.

Register for Coalesce at https://coalesce.getdbt.com

Not just Xs and Os: How sports teams are adopting the modern data stack - Coalesce 2023

This panel discussion led by Data Clymer brings together data leaders from some of the top professional sports organizations in the U.S. to explore how sports and similar mid-size businesses are leveraging data and analytics engineering best practices to fuel revenue growth, improve business efficiencies, and drive fan engagement.

Speakers: Jesse McCabe, Vice President Marketing, Data Clymer; Keelan Smithers, Data Product Manager, Analytics Engineering, NBA; Paimon Jaberi, Managing Director of Strategy and Analytics, Seattle Seahawks; Jared Chavez, Senior Data Engineer, Pacers Sports & Entertainment

Register for Coalesce at https://coalesce.getdbt.com

Overhauling tech debt: A modern data stack migration journey - Coalesce 2023

When a startup is in its early stages, the data infrastructure is typically built with the best intentions, but doesn’t necessarily scale, and often predates many modern data tools available today. Years later, you might find yourself juggling complexity and tech debt when providing insights. This talk shares the journey Unbounce embarked on to modernize their data stack, including approaches to the following hurdles: demonstrating value and ROI to leadership for buy-in; deciding which tools to adopt; coordinating with data engineers and data analysts to deliver the cross-team project; and ensuring there were no interruptions to stakeholders throughout the migration.

Speaker: Morgan Cabot, Analytics engineering technical lead, Unbounce

Register for Coalesce at https://coalesce.getdbt.com