talk-data.com talk-data.com

Topic

dbt

dbt (data build tool)

data_transformation analytics_engineering sql

758

tagged

Activity Trend

134 peak/qtr
2020-Q1 2026-Q1

Activities

758 activities · Newest first

Scaling Trust in BI: How Bolt Manages Thousands of Metrics Across Databricks, dbt, and Looker

Managing metrics across teams can feel like everyone’s speaking a different language, which often leads to loss of trust in numbers. Based on a real-world use case, we’ll show you how to establish a governed source of truth for metrics that works at scale and builds a solid foundation for AI integration. You’ll explore how Bolt.eu’s data team governs consistent metrics for different data users and leverages Euno’s automations to navigate the overlap between Looker and dbt. We’ll cover best practices for deciding where your metrics belong and how to optimize engineering and maintenance workflows across Databricks, dbt and Looker. For curious analytics engineers, we’ll dive into thinking in dimensions & measures vs. tables & columns and determining when pre-aggregations make sense. The goal is to help you contribute to a self-serve experience with consistent metric definitions, so business teams and AI agents can access the right data at the right time without endless back-and-forth.

SQL-First ETL: Building Easy, Efficient Data Pipelines With Lakeflow Declarative Pipelines

This session explores how SQL-based ETL can accelerate development, simplify maintenance and make data transformation more accessible to both engineers and analysts. We'll walk through how Databricks Lakeflow Declarative Pipelines and Databricks SQL warehouse support building production-grade pipelines using familiar SQL constructs.Topics include: Using streaming tables for real-time ingestion and processing Leveraging materialized views to deliver fast, pre-computed datasets Integrating with tools like dbt to manage batch and streaming workflows at scale By the end of the session, you’ll understand how SQL-first approaches can streamline ETL development and support both operational and analytical use cases.

SQL-Based ETL: Options for SQL-Only Databricks Development

Using SQL for data transformation is a powerful way for an analytics team to create their own data pipelines. However, relying on SQL often comes with tradeoffs such as limited functionality, hard-to-maintain stored procedures or skipping best practices like version control and data tests. Databricks supports building high-performing SQL ETL workloads. Attend this session to hear how Databricks supports SQL for data transformation jobs as a core part of your Data Intelligence Platform. In this session we will cover 4 options to use Databricks with SQL syntax to create Delta tables: Lakeflow Declarative Pipelines: A declarative ETL option to simplify batch and streaming pipelines dbt: An open-source framework to apply engineering best practices to SQL based data transformations SQLMesh: an open-core product to easily build high-quality and high-performance data pipelines SQL notebooks jobs: a combination of Databricks Workflows and parameterized SQL notebooks

Sponsored by: dbt Labs | Empowering the Enterprise for the Next Era of AI and BI

The next era of data transformation has arrived. AI is enhancing developer workflows, enabling downstream teams to collaborate effectively through governed self-service. Additionally, SQL comprehension is producing detailed metadata that boosts developer efficiency while ensuring data quality and cost optimization. Experience this firsthand with dbt’s data control plane, a centralized platform that provides organizations with repeatable, scalable, and governed methods to succeed with Databricks in the modern age.

Accelerating Analytics: Integrating BI and Partner Tools to Databricks SQL

This session is repeated. Did you know that you can integrate with your favorite BI tools directly from Databricks SQL? You don’t even need to stand up an additional warehouse. This session shows the integrations with Microsoft Power Platform, Power BI, Tableau and dbt so you can have a seamless integration experience. Directly connect your Databricks workspace with Fabric and Power BI workspaces or Tableau to publish and sync data models, with defined primary and foreign keys, between the two platforms.

In this decades-spanning episode, Tristan Handy sits down with Lonne Jaffe, Managing Director at Insight Partners and former CEO of Syncsort (now Precisely), to trace the history of the data ecosystem—from its mainframe origins to its AI-infused future. Lonne reflects on the evolution of ETL, the unexpected staying power of legacy tech, and why AI may finally erode the switching costs that have long protected incumbents. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

In this episode, Tristan talks to Zach Lloyd, founder of Warp—a terminal built for the modern era, including for AI agents. They explore the history of terminals, differences between terminals and shells, and what the future might look like. In a world driven by generative AI, the terminal could once again be the control center of computer usage. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Last week I posted about the gulf in the data industry between people in favor and opposed AI. The data industry has an atrocious track record of success over the decades. Shall we keep repeating the same mistakes with AI?

I think AI denialism is holding back and AI has the potential to help us correct a lot of the sins of the past, namely quick delivery of value. Whether we get there is another question...

Thanks to dbt, GoodData, and Ellie.ai for sponsoring this podcast.

dbt: https://www.getdbt.com GoodData: https://www.gooddata.com Ellie.ai: https://www.ellie.ai

Ryan Russon is an ML Engineer. He stopped by my house for a practical and grounded chat about ML and AI. Enjoy!


Join dbt Labs May 28 for the dbt Launch Showcase to hear from executives and product leaders about the latest features landing in dbt. See firsthand how features will empower data practitioners and organizations in the age of AI. Thanks to dat Labs for sponsoring this episode.

Some people speculate that AI will make software and data engineers obsolete. If the only thing engineers do is write code, sure.

But we do a lot more than that, and I believe we'll actually need more engineers, not fewer.

In this episode, I discuss how I think AI will change the craft of software and data engineering. Spoiler - I think it will make it way more fun and productive.

Thanks to dbt and GoodData for sponsoring this episode. Please support them, as they're awesome.

dbt Launch Showcase Join dbt Labs May 28 for the dbt Launch Showcase to hear from executives and product leaders about the latest features landing in dbt. See firsthand how features will empower data practitioners and organizations in the age of AI.


GoodData Webinar Analytics and data engineering used to live in separate worlds—different teams, different tools, different goals. But the lines are blurring fast. As modern data products demand speed, scale, and seamless integration, the best teams are embracing engineering principles and best practices. In this no-BS conversation, Ryan Dolley, Matt Housley, and Joe Reis, dive into how engineering principles are transforming the way analytics is built, delivered, and scaled. 📆 May 27, 2025🕘 9:00 AM PDT, 12:00 PM EDT, 6:00 PM CEST🔗 Register here!

In this episode, Tristan Handy and Lukas Schulte, co-founder of SDF Labs and now part of dbt Labs, dive deep into the world of compilers—what they are, how they work, and what they mean for the data ecosystem. SDF, recently acquired by dbt Labs, builds a world-class SQL compiler aimed at abstracting away the complexity of warehouse-specific SQL. Join Tristan and members of the SDF team at the dbt Launch showcase to learn more about the brand new dbt engine. Register at https://www.getdbt.com/resources/webinars/2025-dbt-cloud-launch-showcase For full show notes and to read 8+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

An Evolving DAG for the LLM world - Julia Schottenstein of LangChain at Small Data SF

Directed Acyclic Graphs (DAGs) are the foundation of most orchestration frameworks. But what happens when you allow an LLM to act as the router? Acyclic graphs now become cyclic, which means you have to design for the challenges resulting from all this extra power. We'll cover the ins and outs of agentic applications and how to best use them in your work as a data practitioner or developer building today.

➡️ Follow Us LinkedIn: https://www.linkedin.com/company/small-data-sf/ X/Twitter : https://twitter.com/smalldatasf Website: https://www.smalldatasf.com/


Discover LangChain, the open-source framework for building powerful agentic systems. Learn how to augment LLMs with your private data, moving beyond their training cutoffs. We'll break down how LangChain uses "chains," which are essentially Directed Acyclic Graphs (DAGs) similar to data pipelines you might recognize from dbt. This structure is perfect for common patterns like Retrieval Augmented Generation (RAG), where you orchestrate steps to fetch context from a vector database and feed it to an LLM to generate an informed response, much like preparing data for analysis.

Dive into the world of AI agents, where the LLM itself determines the application's control flow. Unlike a predefined DAG, this allows for dynamic, cyclic graphs where an agent can iterate and improve its response based on previous attempts. We'll explore the core challenges in building reliable agents: effective planning and reflection, managing shared memory across multiple agents in a cognitive architecture, and ensuring reliability against task ambiguity. Understand the critical trade-offs between the dependability of static chains and the flexibility of dynamic LLM agents.

Introducing LangGraph, a framework designed to solve the agent reliability problem by balancing agent control with agency. Through a live demo in LangGraph Studio, see how to build complex AI applications using a cyclic graph. We'll demonstrate how a router agent can delegate tasks, execute a research plan with multiple steps, and use cycles to iterate on a problem. You'll also see how human-in-the-loop intervention can steer the agent for improved performance, a critical feature for building robust and observable agentic systems.

Explore some of the most exciting AI agents in production today. See how Roblox uses an AI assistant to generate virtual worlds from a prompt, how TripAdvisor’s agent acts as a personal travel concierge to create custom itineraries, and how Replit’s coding agent automates code generation and pull requests. These real-world examples showcase the practical power of moving from simple DAGs to dynamic, cyclic graphs for solving complex, agentic problems.

In the first episode of our new season on developer experience, the cofounder and CTO of SDF Labs, now a part of dbt Labs, discusses databases, compilers, and dev tools. Wolfram spent close to two decades in Microsoft Research and several years at Meta building their data platform. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Summary In this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3. Pete shares his journey into data engineering, discusses Astronomer's contributions to the Airflow project, and highlights the critical role of Airflow in powering operational data products. He covers the evolution of Airflow, its position in the data ecosystem, and the challenges faced by data engineers, including infrastructure management and observability. The conversation also touches on the upcoming Airflow 3 release, which introduces data awareness, architectural improvements, and multi-language support, and Astronomer's observability suite, Astro Observe, which provides insights and proactive recommendations for Airflow users.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Pete DeJoy about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3Interview IntroductionCan you describe what Astronomer is and the story behind it?How would you characterize the relationship between Airflow and Astronomer?Astronomer just released your State of Airflow 2025 Report yesterday and it is the largest data engineering survey ever with over 5,000 respondents. Can you talk a bit about top level findings in the report?What about the overall growth of the Airflow project over time?How have the focus and features of Astronomer changed since it was last featured on the show in 2017?Astro Observe GA’d in early February, what does the addition of pipeline observability mean for your customers? What are other capabilities similar in scope to observability that Astronomer is looking at adding to the platform?Why is Airflow so critical in providing an elevated Observability–or cataloging, or something simlar - experience in a DataOps platform? What are the notable evolutions in the Airflow project and ecosystem in that time?What are the core improvements that are planned for Airflow 3.0?What are the most interesting, innovative, or unexpected ways that you have seen Astro used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Airflow and Astro?What do you have planned for the future of Astro/Astronomer/Airflow?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AstronomerAirflowMaxime BeaucheminMongoDBDatabricksConfluentSparkKafkaDagsterPodcast EpisodePrefectAirflow 3The Rise of the Data Engineer blog postdbtJupyter NotebookZapiercosmos library for dbt in AirflowRuffAirflow Custom OperatorSnowflakeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In this podcast episode, we talked with Adrian Brudaru about ​the past, present and future of data engineering.

About the speaker: Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted. As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.

0:00 Introduction to DataTalks.Club 1:05 Discussing trends in data engineering with Adrian 2:03 Adrian's background and journey into data engineering 5:04 Growth and updates on Adrian's company, DLT Hub 9:05 Challenges and specialization in data engineering today 13:00 Opportunities for data engineers entering the field 15:00 The "Modern Data Stack" and its evolution 17:25 Emerging trends: AI integration and Iceberg technology 27:40 DuckDB and the emergence of portable, cost-effective data stacks 32:14 The rise and impact of dbt in data engineering 34:08 Alternatives to dbt: SQLMesh and others 35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions 37:20 Audience questions: Career focus in data roles and AI engineering overlaps 39:00 The role of semantics in data and AI workflows 41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem

🔗 CONNECT WITH ADRIAN BRUDARU Linkedin -  / data-team   Website - https://adrian.brudaru.com/ 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn -  /datatalks-club   Twitter -  /datatalksclub   Website - https://datatalks.club/

Daniel Avancini is the chief data officer and co-founder of Indicium—a fast-growing data consultancy started in Brazil.  There are a lot of data consultancies around the world, and a lot of them do great work. What has been so fascinating about Indicium's journey is their HR model. Rather than primarily hiring experienced professionals, they decided to go hard on training. They built a talent pipeline with courses and an internal onboarding process that takes new employees from zero to 60 over a few months. The result has been phenomenal and Indicium delivers great client outcomes, but most importantly, they're building skills for hundreds of brand new data professionals. Data is a hard field to break into because fundamentally you can't do the real thing unless you have access to data. So any company that is investing in building scalable hiring and training processes for analytical talent is one to be excited about. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.