talk-data.com talk-data.com

T

Speaker

Tahir Fayyaz

4

talks

Product Manager at Databricks / Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert

Frequent Collaborators

Filter by Event / Source

Talks & appearances

4 activities · Newest first

Search activities →
session
with Tahir Fayyaz (/ Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert) , Shanelle Roman

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

session
with Tahir Fayyaz (/ Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert) , Shanelle Roman

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

Building and maintaining data pipelines and workflows can be very complicated, difficult to manage and error prone. AirBnb created and open-sourced Apache Airflow in 2015 to solve a lot of the problems. If you have ever needed to automate, schedule, handle errors or send out alerts for your data workflows then Airflow will be the perfect new tool in your data toolbox.