talk-data.com talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

5

tagged

Activity Trend

157 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: dbt Coalesce 2022 ×
Beyond pretty graphs: How end-to-end lineage drives better actions

Everyone is talking about data lineage these days, and for a good reason. Data lineage helps ensure better data quality across your modern data stack. But not everyone speaks the same lineage language. Data engineers use lineage for impact and root cause analysis. Analysts and Analytics engineers use lineage to trace jobs and transformations in their warehouses. And consumers use lineage to understand why data never reached their expected destination. This results in a narrow, siloed view lineage in which only one group benefits. It’s time to stop using siloed lineage views for pretty graphs and start using end-to-end lineage to drive focused actions. In the talk, you will learn:

• How data quality tailors to specific needs of data engineers, analysts, & consumers

• How data lineage should drive actions

• A real-world example of end-to-end data lineage with Airflow, dbt, Spark, and Redshift

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Building a Data Platform from Scratch with dbt, Snowflake and Looker

When Prateek Chawla, founding engineer, joined Monte Carlo in 2019, he was responsible for spinning up our data platform from scratch. He was more of a backend/cloud engineer, but like with any startup had to wear many hats, so got the opportunity to play the role of data engineer too. In this talk, we’ll walk through how we spun up Monte Calro’s data stack with Snowflake, Looker, and dbt, touching on how and why we implemented dbt (and later, dbt Cloud), key use cases, and handy tricks for integrating dbt with other popular tools, like Airflow, and Spark. We’ll discuss what worked, what didn’t work, and other lessons learned along the way, as well as share how our data stack evolved over time to scale to meet the demands of our growing startup. We’ll also touch on a very critical component of the dbt value proposition, data quality testing, and discuss some of our favorite tests and what we’ve done to automate and integrate them with other elements of our stack.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Field-level lineage with dbt, ANTLR, and Snowflake

Lineage is a critical component of any root cause, impact analysis, and overall analytics heath assessment workflow. But it hasn’t always been easy to create, particularly at the field level. In this session, Mei Tao, Helena Munoz, and Xuanzi Han (Monte Carlo) tackle this challenge head-on by leveraging some of the most popular tools in the modern data stack, including dbt, Airflow, Snowflake, and ANother Tool for Language Recognition (ANTLR). Learn how they designed the data model, query parser, and larger database design for field-level lineage—highlighting learnings, wrong turns, and best practices developed along the way.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Petabyte-scale lakehouses with dbt and Apache Hudi

While the data lakehouse architecture offers many inherent benefits, it’s still relatively new to the dbt community, which creates hurdles to adoption.

In this talk, you’ll meet Apache Hudi, a platform used by organizations to build planet-scale data platforms according to all of the key design elements required by the lakehouse architecture. You’ll also learn how we’ve personaly used Hudi, along with dbt, Spark, Airflow, and many more open-source tools to build a truly reliable big data streaming lakehouse that cut the latency of our petabyte-scale data pipelines from hours to minutes.

Check the slides here: https://docs.google.com/presentation/d/18dv4TZzRnZQ-IK7xLkYJuind4Bcztkl19zV7b4HTaTU/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Democratizing data at Zillow with dbt, Airflow, Spark, and Kubernetes

Building data pipelines is difficult—and adding a data governance and observability framework doesn’t make it any easier. But that was the task ahead for Deepak Konidena during his early days at Zillow. In this session, he’ll share how the platform they build on top of dbt, Airflow, Spark, and Kubernetes—ZSQL—eliminated the need for internal data teams to build their own DAGs, models, schemas and lineage from scratch, while also providing an easy way to enforce data quality, monitor changes, and alert on disruptions.

Check the slides here: https://docs.google.com/presentation/d/18HEil3_nXD8nYBhcg4m-Kpy8I8Na6MXI/edit?usp=sharing&ouid=110293204340061069659&rtpof=true&sd=true

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.