talk-data.com talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

22

tagged

Activity Trend

157 peak/qtr
2020-Q1 2026-Q1

Activities

22 activities · Newest first

AWS re:Invent 2025 - Innovations in AWS analytics: Data processing (ANT305)

Explore the latest advancements in AWS Analytics designed to transform your data processing landscape. This session unveils powerful new capabilities across key services, including Amazon EMR for scalable big data processing, AWS Glue for seamless data integration, Amazon Athena for optimized querying, and Amazon Managed Workflows for Apache Airflow (MWAA) for workflow orchestration. Discover how these innovations can supercharge performance, optimize costs, and streamline your data ecosystem. Whether you're looking to enhance scalability, improve data integration, accelerate queries, or refine workflow management, join us to gain actionable insights that will position your organization at the forefront of data processing innovation.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

From Apache Airflow to Lakeflow Jobs: A Guide for Workflow Modernization

This is an overview of migrating from Apache Airflow to Lakeflow Jobs for modern data orchestration. It covers key differences, best practices and practical examples of transitioning from traditional Airflow DAGs orchestrating legacy systems to declarative, incremental ETL pipelines with Lakeflow. Attendees will gain actionable tips on how to improve efficiency, scalability and maintainability in their workflows.

Sponsored by: Astronomer | Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3

Airflow 3 is here, bringing a new era of flexibility, scalability, and security to data orchestration. This release makes building, running, and managing data pipelines easier than ever. In this session, we will cover the key benefits of Airflow 3, including: (1) Ease of Use: Airflow 3 rethinks the user experience—from an intuitive, upgraded UI to DAG Versioning and scheduler-integrated backfills that let teams manage pipelines more effectively than ever before (2) Stronger Security: By decoupling task execution from direct database connections, Airflow 3 enforces task isolation and minimal-privilege access. This meets stringent compliance standards while reducing the risk of unauthorized data exposure. (3) Ultimate Flexibility: Run tasks anywhere, anytime with remote execution and event-driven scheduling. Airflow 3 is designed for global, heterogeneous modern data environments with an architecture that facilitates edge and hybrid-cloud to GPU-based deployments.

Scaling AI workloads with Ray & Airflow

Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.

On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.

This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.

Ricardo Sueiras: The State of Apache Airflow

🌟 Session Overview 🌟

Session Name: The State of Apache Airflow Speaker: Ricardo Sueiras Session Description: Apache Airflow is a popular workflow orchestration tool with a thriving and welcoming community that drives constant innovation and new releases. I will walk you through some of the new features you can expect to find in Apache Airflow, as well as cover use cases and some advanced topics to help you supercharge your use of Apache Airflow. There will be plenty of live demos of these features, and I hope they will encourage you all to try the new advanced features of Airflow yourself.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

AWS re:Invent 2024 - Scale with self-service analytics on AWS (ANT334)

Self-service analytics empowers users to independently access, explore, and analyze data to accelerate decision-making. It’s a foundational step for organizations to democratize the use of data for business growth with easy-to-use, simple-to-understand, and quick-to-deliver tools. Interactive querying, SQL analytics, data preparation, data transformation, data workflow orchestration and search analytics are some of the key day-to-day functions that data users need from self-service solutions. Learn how AWS analytics services like Amazon Athena, AWS Glue, Amazon Redshift, Amazon Managed Workflows for Apache Airflow, and Amazon OpenSearch Service enable data-driven decision-making through self-serve analytics.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

Coalesce 2024: Transitioning from dbt Core to dbt Cloud: A user story

Join us as we share our journey of migrating from dbt Core to dbt Cloud. We'll discuss why we made this shift – focusing on security, ownership, and standardization. Starting with separate team-based projects on dbt Core, we moved towards a unified structure, and eventually embraced dbt Cloud. Now, all teams follow a common structure and standardized requirements, ensuring better security and collaboration.

In our session, we'll explore how we improved our data analytics processes by migrating from dbt Core to dbt Cloud. Initially, each team had its way of working on dbt Core, leading to security risks and inconsistent practices. To address this, we transitioned to a more unified approach on dbt Core. This year we migrated dbt Cloud, which allowed us to centralize our data analytics workflows, enhancing security and promoting collaboration.

For scheduling we manage our own Airflow instance using AWS EKS. We use Datahub as data catalog.

Key points: Enhanced Security: dbt Cloud provided robust security features, helping us safeguard our data pipelines. Ownership and Collaboration: With dbt Cloud, teams took ownership of their projects while collaborating more effectively. Standardization: We enforced standardized requirements across all projects, ensuring consistency and efficiency, using dbt-project-evaluator.

Speakers: Alejandro Ivanez Platform Engineer DPG Media

Mathias Lavaert Principal Platform Engineer DPG Media

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: From Core to Cloud: Unlocking dbt at Warner Brothers Discovery (CNN)

Since the beginning of 2024, the Warner Brothers Discovery team supporting the CNN data platform has been undergoing an extensive migration project from dbt Core to dbt Cloud. Concurrently, the team is also segmenting their project into multi-project frameworks utilizing dbt Mesh. In this talk, Zachary will review how this transition has simplified data pipelines, improved pipeline performance and data quality, and made data collaboration at scale more seamless.

He'll discuss how dbt Cloud features like the Cloud IDE, automated testing, documentation, and code deployment have enabled the team to standardize on a single developer platform while also managing dependencies effectively. He'll share details on how the automation framework they built using Terraform streamlines dbt project deployments with dbt Cloud to a ""push-button"" process. By leveraging an infrastructure as code experience, they can orchestrate the creation of environment variables, dbt Cloud jobs, Airflow connections, and AWS secrets with a unified approach that ensures consistency and reliability across projects.

Speakers: Mamta Gupta Staff Analytics Engineer Warner Brothers Discovery

Zachary Lancaster Manager, Data Engineering Warner Brothers Discovery

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Igor Khrol: Big Data With Open Source Solutions

Join Igor Khrol as he delves into the world of Big Data with Open Source Solutions at Automattic, a company rooted in the power of open source. 📊🌐 Discover their unique approach to maintaining a data ecosystem based on Hadoop, Spark, Trino, Airflow, Superset, and JupyterHub, all hosted on bare metal infrastructure, and gain insights on how it compares to cloud-based alternatives in 2023. 💡🚀 #BigData #opensource

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Ricardo Sueiras: Using Modern Application Principals to Automate Your Apache Airflow Data Pipelines

Discover the power of modern application principles in automating your Apache Airflow Data Pipelines with Ricardo Sueiras. 🚀🐍 Learn how to leverage CI/CD for seamless development, testing, and deployment, and say goodbye to manual cron job management! 💻📈 #ApacheAirflow #automation

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Ricardo Sueiras: Getting Started with Apache Airflow - Building Your First Workflow

Embark on your data engineering journey with Ricardo Sueiras and learn the essentials of Apache Airflow in 'Getting Started with Apache Airflow - Building Your First Workflow.' 🚀🐍 Discover the architecture and create your first workflow, perfect for beginners eager to explore this open-source powerhouse! 💻📊 #ApacheAirflow #WorkflowCreation

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Cross-Platform Data Lineage with OpenLineage

There are more data tools available than ever before, and it is easier to build a pipeline than it has ever been. These tools and advancements have created an explosion of innovation, resulting in data within today's organizations becoming increasingly distributed and can't be contained within a single brain, a single team, or a single platform. Data lineage can help by tracing the relationships between datasets and providing a map of your entire data universe.

OpenLineage provides a standard for lineage collection that spans multiple platforms, including Apache Airflow, Apache Spark™, Flink®, and dbt. This empowers teams to diagnose and address widespread data quality and efficiency issues in real time. In this session, we will show how to trace data lineage across Apache Spark and Apache Airflow. There will be a walk-through of the OpenLineage architecture and a live demo of a running pipeline with real-time data lineage.

Talk by: Julien Le Dem,Willy Lulciuc

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Beyond pretty graphs: How end-to-end lineage drives better actions

Everyone is talking about data lineage these days, and for a good reason. Data lineage helps ensure better data quality across your modern data stack. But not everyone speaks the same lineage language. Data engineers use lineage for impact and root cause analysis. Analysts and Analytics engineers use lineage to trace jobs and transformations in their warehouses. And consumers use lineage to understand why data never reached their expected destination. This results in a narrow, siloed view lineage in which only one group benefits. It’s time to stop using siloed lineage views for pretty graphs and start using end-to-end lineage to drive focused actions. In the talk, you will learn:

• How data quality tailors to specific needs of data engineers, analysts, & consumers

• How data lineage should drive actions

• A real-world example of end-to-end data lineage with Airflow, dbt, Spark, and Redshift

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Building a Data Platform from Scratch with dbt, Snowflake and Looker

When Prateek Chawla, founding engineer, joined Monte Carlo in 2019, he was responsible for spinning up our data platform from scratch. He was more of a backend/cloud engineer, but like with any startup had to wear many hats, so got the opportunity to play the role of data engineer too. In this talk, we’ll walk through how we spun up Monte Calro’s data stack with Snowflake, Looker, and dbt, touching on how and why we implemented dbt (and later, dbt Cloud), key use cases, and handy tricks for integrating dbt with other popular tools, like Airflow, and Spark. We’ll discuss what worked, what didn’t work, and other lessons learned along the way, as well as share how our data stack evolved over time to scale to meet the demands of our growing startup. We’ll also touch on a very critical component of the dbt value proposition, data quality testing, and discuss some of our favorite tests and what we’ve done to automate and integrate them with other elements of our stack.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Field-level lineage with dbt, ANTLR, and Snowflake

Lineage is a critical component of any root cause, impact analysis, and overall analytics heath assessment workflow. But it hasn’t always been easy to create, particularly at the field level. In this session, Mei Tao, Helena Munoz, and Xuanzi Han (Monte Carlo) tackle this challenge head-on by leveraging some of the most popular tools in the modern data stack, including dbt, Airflow, Snowflake, and ANother Tool for Language Recognition (ANTLR). Learn how they designed the data model, query parser, and larger database design for field-level lineage—highlighting learnings, wrong turns, and best practices developed along the way.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Petabyte-scale lakehouses with dbt and Apache Hudi

While the data lakehouse architecture offers many inherent benefits, it’s still relatively new to the dbt community, which creates hurdles to adoption.

In this talk, you’ll meet Apache Hudi, a platform used by organizations to build planet-scale data platforms according to all of the key design elements required by the lakehouse architecture. You’ll also learn how we’ve personaly used Hudi, along with dbt, Spark, Airflow, and many more open-source tools to build a truly reliable big data streaming lakehouse that cut the latency of our petabyte-scale data pipelines from hours to minutes.

Check the slides here: https://docs.google.com/presentation/d/18dv4TZzRnZQ-IK7xLkYJuind4Bcztkl19zV7b4HTaTU/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Democratizing data at Zillow with dbt, Airflow, Spark, and Kubernetes

Building data pipelines is difficult—and adding a data governance and observability framework doesn’t make it any easier. But that was the task ahead for Deepak Konidena during his early days at Zillow. In this session, he’ll share how the platform they build on top of dbt, Airflow, Spark, and Kubernetes—ZSQL—eliminated the need for internal data teams to build their own DAGs, models, schemas and lineage from scratch, while also providing an easy way to enforce data quality, monitor changes, and alert on disruptions.

Check the slides here: https://docs.google.com/presentation/d/18HEil3_nXD8nYBhcg4m-Kpy8I8Na6MXI/edit?usp=sharing&ouid=110293204340061069659&rtpof=true&sd=true

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Open Source Powers the Modern Data Stack

Lakehouses like Databricks’ Delta Lake are becoming the central brain for all data systems. But Lakehouses are only one component of the data stack. There are many building blocks required for tackling data needs, including data integrations, data transformation, data quality, observability, orchestration etc.

In this session, we will present how open source powers companies' approach to building a modern data stack. We will talk about technologies like Airbyte, Airflow, dbt, Preset, and how to connect them in order to build a customized and extensible data platform centered around Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Constraints, Democratization, and the Modern Data Stack - Building a Data Platform At Red Ventures

The time and attention of skilled engineers are some of the most constrained, valuable resources at Red Digital, a marketing agency embedded within Red Ventures. Acknowledging that constraint, the team at Red Digital has taken a deliberate, product-first approach to modernize and democratize their data platform. With the help of modern tools like Databricks, Fivetran, dbt, Monte Carlo, and Airflow, Red Digital has increased its development velocity and the size of the available talent pool to continue to grow the business.

This talk will walk through some of the key challenges, decisions, and solutions that the Red Digital team has made to build a suite of parallel data stacks capable of supporting its growing business.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Elixir: The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox

Elixir is an Erlang-VM bytecode-compatible programming language that is growing in popularity.

In this session I will show how you can apply Elixir towards solving data engineering problems in novel ways.

Examples include: • How to leverage Erlang's lightweight distributed process coordination to run clusters of workers across docker containers and perform data ingestion. • A framework that hooks Elixir functions as steps into Airflow graphs. • How to consume and process Kafka events directly within Elixir microservices.

For each of the above I'll show real system examples and walk through the key elements step by step. No prior familiarity with Erlang or Elixir will be required.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/