talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

6

Airflow Summit 2023 program

Filtering by: dbt ×

Sessions & talks

Showing 1–6 of 6 · Newest first

Search within this event →

Airflow at StyleSeat: Our journey, challenges & results

2023-07-01
session

We will share the case study of Airflow at StyleSeat, where within a year our data grew from 2 million data points per day to 200 million. Our original solution for orchestrating this data was not enough, so we migrated to an Airflow based solution. Previous implementation Our tasks were orchestrated with hourly triggers on AWS Cloudwatch rules in their own log groups. Each task was a lambda individually defined as a task and executed python code from a docker image. As complexity increased, there were frequent downtimes and manual executions for failed tasks and their downstream dependencies.With every downtime, our business stakeholders started losing trust in Data and recovery times were longer with each downtime. We needed a modern orchestration platform which would enable our team to define and instrument complex pipelines as code, provide visibility into executions and define retry criteria on failures. Airflow was identified as a crucial & critical piece in modernizing our orchestration which would help us further onboard DBT. We wanted a managed solution and a partner who could help guide us to a successful migration.

A Single Pane of Glass on Airflow using Astro Python SDK, Snowflake, dbt, and Cosmos

2023-07-01
session

ETL data pipelines are the bread and butter of data teams that must design, develop, and author DAGs to accommodate the various business requirements. dbt is becoming one of the most used tools to perform SQL transformations on the Data Warehouse, allowing teams to harness the power of queries at scale. Airflow users are constantly finding new ways to integrate dbt with the Airflow ecosystem and build a single pane of glass where Data Engineers can manage and administer their pipelines. Astronomer Cosmos, an open-source product, has been introduced to integrate Airflow with dbt Core seamlessly. Now you can easily see your dbt pipelines fully integrated on Airflow. You will learn the following: How to integrate dbt Core with Airflow How to use Cosmos How to build data pipelines at scale

Data Product DAGs

2023-07-01
session

This talk will cover in high overview the architecture of a data product DAG, the benefits in a data mesh world and how to implement it easily. Airflow is the de-facto orchestrator we use at Astrafy for all our data engineering projects. Over the years we have developed deep expertise in orchestrating data jobs and recently we have adopted the “data mesh” paradigm of having one Airlfow DAG per data product. Our standard data product DAGs contain the following stages: Data contract: check integrity of data before transforming the data Data transformation: applies dbt transformation via a kubernetes pod operator Data distribution: mainly informing downstream applications that new data is available to be consumed For use cases where different data products need to be finished before triggering another data product, we have a mechanism with an engine in between that keeps track of finished dags and triggers DAGs based on a mapping table containing data products dependencies.

Empowering Collaborative Data Workflows with Airflow and Cloud Services

2023-07-01
session

Productive cross-team collaboration between data engineers and analysts is the goal of all data teams, however, fulfilling on that mission can be challenging given the diverse set of skills that each group brings. In this talk we present an example of how one team tackled this topic by creating a flexible, dynamic and extensible framework using Airflow and cloud services that allowed engineers and analysts to jointly create data-centric micro-services to serve up projections and other robust analysis for use in the organization. The framework, which utilized dynamic DAG generation configured using yaml files, Kubernetes jobs and dbt transformations, abstracted away many of the details associated with workflow orchestration, allowing analysts to focus on their Python or R code and data processing logic while enabling data engineers to monitor the pipelines and ensure their scalability.

Manifest destiny: Orchestrating dbt using Airflow

2023-07-01
session

Airflow is a popular choice for organizations looking to integrate open-source dbt within their existing data infrastructure. This talk will explore two primary methods of running dbt in Airflow: job-level and model-level. We’ll discuss the tradeoffs associated with each approach, highlighting the simplicity and efficiency of job-level orchestration, contrasted with the enhanced observability and control provided by model-level orchestration. We’ll also explain how the balance has shifted in recent years, with improvements to dbt core making model-level more efficient and innovative Airflow extensions like Cosmos making it easier to implement. Finally, we’ll provide benchmarks to help you determine which paradigm is the best fit for your needs.

Using Dynamic Task Mapping to Orchestrate dbt

2023-07-01
session

Airflow, traditionally used by Data Engineers, is now popular among Analytics Engineers who aim to provide analysts with high-quality tooling while adhering to software engineering best practices. dbt, an open-source project that uses SQL to create data transformation pipelines, is one such tool. One approach to orchestrating dbt using Airflow is using dynamic task mapping to automatically create a task for each sub-directory inside dbt’s staging, intermediate, and marts directories. This enables analysts to write SQL code that is automatically added as a dedicated task in Airflow at runtime. Combining this new Airflow feature with dbt best practices offers several benefits, such as analysts not needing to make Airflow changes and engineers being able to re-run subsets of dbt models should errors occur. In this talk, I would like to share some lessons I have learned while successfully implementing this approach for several clients.