talk-data.com talk-data.com

Topic

dbt

dbt (data build tool)

data_transformation analytics_engineering sql

758

tagged

Activity Trend

134 peak/qtr
2020-Q1 2026-Q1

Activities

758 activities · Newest first

If Data Vault is a new term for you, it's a data modeling design pattern. We're joined by Brandon Taylor, a senior data architect at Guild, and Michael Olschimke, who is the CEO of Scalefree—the consulting firm whose co-founder Dan Lindstedt is credited as the designer of the data vault architecture.  In this conversation with Tristan and Julia, Michael and Brandon explore the Data Vault approach among data warehouse design methodologies. They discuss Data Vault's adoption in Europe, its alignment with data mesh architecture, and the ongoing debate over Data Vault vs. Kimball methods.  For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Summary

Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Eran Yahav about building an AI powered developer assistant at Tabnine

Interview

Introduction How did you get involved in machine learning? Can you describe what Tabnine is and the story behind it? What are the individual and organizational motivations for using AI to generate code?

What are the real-world limitations of generative AI for creating software? (e.g. size/complexity of the outputs, naming conventions, etc.) What are the elements of skepticism/overs

Poor data engineering is like building a shaky foundation for a house—it leads to unreliable information, wasted time and money, and even legal problems, making everything less dependable and more troublesome in our digital world. In the retail industry specifically, data engineering is particularly important for managing and analyzing large volumes of sales, inventory, and customer data, enabling better demand forecasting, inventory optimization, and personalized customer experiences. It helps retailers make informed decisions, streamline operations, and remain competitive in a rapidly evolving market. Insight and frameworks learned from data engineering practices can be applied to a multitude of people and problems, and in turn, learning from someone who has been at the forefront of data engineering is invaluable.   Mohammad Sabah is SVP of Engineering and Data at Thrive Market, and was appointed to this role in 2018. He joined the company from The Honest Company where he served as VP of Engineering & Chief Data Scientist. Sabah joined The Honest Company following its acquisition of Insnap, which he co-founded in 2015. Over the course of his career, Sabah has held various data science and engineering roles at companies including Facebook, Workday, Netflix, and Yahoo! In the episode, Richie and Mo explore the importance of using AI to identify patterns and proactively address common errors, the use of tools like dbt and SODA for data pipeline abstraction and stakeholder involvement in data quality, data governance and data quality as foundations for strong data engineering, validation layers at each step of the data pipeline to ensure data quality, collaboration between data analysts and data engineers for holistic problem-solving and reusability of patterns, ownership mentality in data engineering and much more.  Links from the show: PagerDutyDomoOpsGeneCareer Track: Data Engineer

Summary

Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Lukas Fittl about optimizing your database performance and tips for tuning Postgres

Interview

Introduction How did you get involved in the area of data management? What are the different ways that database performance problems impact the business? What are the most common contributors to performance issues? What are the useful signals that indicate performance challenges in the database?

For a given symptom, what are the steps that you recommend for determining the proximate cause?

What are the potential negative impacts to be aware of when tu

dbt Labs on dbt (London) - Coalesce 2023

In the dbt Labs on dbt series, you get a behind-the-scenes look at how dbt Labs uses data. Learn how dbt Labs thinks about the role of data, how data developers collaborate with business leaders, and the technical decisions we’ve made in our own dbt project.

In this session, Mark Matteucci, Enterprise Systems Architect at dbt Labs, and Mary Wleklinski, Senior Director of Revenue Marketing at dbt Labs share the rising importance of data in a challenging economic environment. You'll learn about a variety of dbt use cases across finance, G&A, and marketing and how we use data to solve tough business challenges. Learn about dbt Labs' journey and leave with ideas that you can implement in your dbt project today.

Register for Coalesce at https://coalesce.getdbt.com

Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion.  MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition.  In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of Databricks, and the importance of responsible AI practices. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

dbt Labs on dbt (Sydney) - Coalesce 2023

In the dbt Labs on dbt series, get a behind-the-scenes look at how dbt Labs uses data. You’ll learn how dbt Labs thinks about the role of data, how data developers collaborate with business leaders, and the technical decisions we’ve made in our own dbt project.

In this session, Danny Lambert, Director of Marketing Operations at dbt Labs, shares how he partnered with marketing leaders and the data team to deliver clean, simple reporting that help everyone from campaign managers to the executive team make better decisions about marketing spend and optimization. Jakki Jakaj, Associate Manager of Integrated Marketing at dbt Labs, shares how her team uses this data every day to optimize marketing campaigns

Register for Coalesce at https://coalesce.getdbt.com/

No Compromises: Analytics Engineering on the Lakehouse (London) - Coalesce 2023

The Lakehouse architecture has emerged as the ideal data architecture for the AI age. It unites the structured world of analytics with the rapidly evolving world of AI.But what really makes something a “Lakehouse” and why should you care? In this session, Databricks discusses the key components of a lakehouse, what you should look for when adapting this paradigm and how Databricks and dbt are together enabling analytics engineers, analysts and data scientists to collaborate together on a single unified platform.

Speaker: Thor List, Senior Field Engineering Manager, Databricks

Register for Coalesce at https://coalesce.getdbt.com

dbt Labs product spotlight & keynote (London) - Coalesce 2023

Seven years ago, in the early days of dbt, dbt Labs was building capabilities that made data developers more productive. dbt gained traction. It took off. Today, dbt is the standard for data transformation. It has never been easier to build and ship data products. This has created a new challenge – complexity. Join Heidi Brandenburg, Vice President of Engineering at dbt Labs, and Sai Maddali, Staff Product Manager at dbt Labs, in London to hear all about the latest releases in dbt and how they are helping organizations navigate complexity.

Speakers: Heidi Brandenburg, Vice President of Engineering, dbt Labs; Sai Maddali, Staff Product Manager, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

Summary

Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro. That’s three free boards at dataengineeringpodcast.com/miro. Your host is Tobias Macey and today I'm interviewing Tanya Bragin about her views on the database products market

Interview

Introduction How did you get involved in the area of data management? What are the aspects of the database market that keep you interested as a VP of product?

How have your experiences at Elastic informed your current work at Clickhouse?

What are the main product categories for databases today?

What are the industry trends that have the most impact on the development and growth of different product categories? Which categories do you see growing the fastest?

When a team is selecting a database technology for a given task, what are the types of questions that they should be asking? Transactional engines like Postgres, SQL Server, Oracle, etc. were long used

Scaling analytics engineering: Building an ecosystem within an enterprise - Coalesce 2023

How a team of eight Analytics Engineers rolled out dbt across Virgin Media O2 to become a data driven company.

Speakers: Oliver Burt, Lead Analytics Engineer, Virgin Media O2; Jason Jones, Analytics Engineering Manager, Virgin Media O2

Register for Coalesce at https://coalesce.getdbt.com

Enterprise MDS deployment at scale: dbt & DevOps - Coalesce 2023

Behind any good DataOps within a Modern Data Stack (MDS) architecture is a solid DevOps design! This is particularly pressing when building an MDS solution at scale, as reliability, quality and availability of data requires a very high degree of process automation while remaining fast, agile and resilient to change when addressing business needs.

While DevOps in Data Engineering is nothing new - for a broad-spectrum solution that includes data warehouse, BI, etc seemed either a bit out of reach due to overall complexity and cost - or simply overlooked due to perceived issues around scaling often attributed to the challenges of automation in CI/CD processes. However, this has been fast changing with tools such as dbt having super cool features which allow a very high degree of autonomy in the CI/CD processes with relative ease, with flexible and cutting edge features around pre-commits, Slim CI, etc.

In this session, Datatonic covers the challenges around building and deploying enterprise-grade MDS solutions for analytics at scale and how they have used dbt to address those - especially around near-complete autonomy to the CI/CD processes!

Speaker: Ash Sultan, Lead Data Architect, Datatonic

Register for Coalesce at https://coalesce.getdbt.com

Demystifying Data Vault with dbt - Coalesce 2023

In this session, Alex Higgs unveils the potential of Data Vault 2.0, an often overlooked but powerful data warehousing method. Discover how it offers scalability, agility, and flexibility to your data solutions.

Key Highlights: - Explore the origins and essence of Data Vault 2.0 - Learn how Data Vault 2.0 streamlines big data solutions for scalability. - See how it integrates with dbt via AutomateDV for faster time to value. - Understand how AutomateDV simplifies Data Vault 2.0 data warehouses, freeing data teams from intricate SQL.

Speaker: Alex Higgs, Senior Consultant Data Engineer, Datavault

Register for Coalesce at https://coalesce.getdbt.com

Beyond 10+ dbt projects: Leveraging automation and avoiding chaos - Coalesce 2023

Hear Benoit Perigaud explain how automation in dbt boosts productivity for large-scale operations. You'll learn how automation is significant throughout the dbt project life cycle, from helping you create projects, to executing rules and building observability on top of all transformations. This session is especially beneficial for data practitioners who manage many dbt projects (into the hundreds!), but it's also great information for anyone who may do so in the future.

Speaker: Benoit Perigaud, Staff Analytics Engineer, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

Central application for all your dbt packages - Coalesce 2023

dbt packages are libraries for dbt. Packages can produce information about best practice for your dbt project (ex: dbt project evaluator) and cloud warehouse cost overviews. Unfortunately, all theses KPIs are stored in your data warehouse and it can be painful and expensive to create data visualization dashboards. This application build automatically dashboards from dbt packages that you are using. You just need to parameter your dbt Cloud API key - that's it! In this session, you'll learn how.

Speaker: Adrien Boutreau, Head of Analytics Engineers , Infinite Lambda

Register for Coalesce at https://coalesce.getdbt.com

Your data warehouse is a success but your repository a mess: get your code on a diet - Coalesce 2023

Over the past four years, the data team at EQT has leveraged dbt and Snowflake to create a myriad of data products across the company. With a rapidly growing organization and increased demands for timely and accurate data, their immense monolithic dbt repository has become challenging to maintain. Learn about the best practices they are adopting to keep the platform in shape and scale with the business.

Speaker: Erik Lehto, Senior analytics engineer, EQT

Register for Coalesce at https://coalesce.getdbt.com

Could you defend your data in court? - Coalesce 2023

In analytics, generating a number is the easy part. Proving you got it right is much harder. In this talk, Christine Dixon walks through why you should strive to make your data pipelines reproducible and transparent, to ensure that you can always defend your data. She considers the ways in which dbt makes it easy, but also digs into some difficult technical scenarios and compromises you might need to make.

Speaker: Christine Dixon, Lead Analyst, Mantel Group

Register for Coalesce at https://coalesce.getdbt.com

dbt for rapid deployment of a data product - Coalesce 2023

The team at nib Health has internal projects that contain standardized packages for running a dbt project, such as pipeline management, data testing, and data modeling macros. In this talk, they share how they utilized the yaml documentation files in dbt to create standardized tagging for both data security (PII), project tags, and product domain tags that get pushed into Snowflake, Immuta, and Select Star.

Speaker: Pip Sidaway, Data Product Manager, nib

Register for Coalesce at https://coalesce.getdbt.com

Embracing a modern data stack in the water industry - Coalesce 2023

Learn about Watercare's journey in implementing a modern data stack with a focus on self serving analytics in the water industry. The session covers the reasons behind Watercare's decision to implement a modern data stack, the problem of data conformity, and the tools they used to accelerate their data modeling process. Diego also discusses the benefits of using dbt, Snowflake, and Azure DevOps in data modeling. There is also a parallel drawn between analytics and Diego’s connection with jazz music.

Speaker: Diego Morales, Civil Industrial Engineer, Watercare

Register for Coalesce at https://coalesce.getdbt.com