talk-data.com talk-data.com

P

Speaker

Philippe Gagnon

5

talks

Senior Solutions Architect at Astronomer

Filter by Event / Source

Talks & appearances

5 activities · Newest first

Search activities →

This workshop will provide an overview of implementing operations research problems using Apache Airflow. This is a hands-on session where attendees will gain experience creating DAGs to define and manage workflows for classical operations research problems. The workshop will include several examples of how Airflow can be used to optimize and automate various decision-making processes, including: Inventory management: How to use Airflow to optimize inventory levels and reduce stockouts by analyzing demand patterns, lead times, and other factors. Production planning: How to use Airflow to create optimized production schedules that minimize downtime, reduce costs, and increase throughput. Logistics optimization: How to use Airflow to optimize transportation routes and other factors to improve the efficiency of logistics operations. Attendees will come away with a solid understanding of using Airflow to automate decision-making processes with optimization solvers.

Trino is incredibly effective at enabling users to extract insights quickly and effectively from large amount of data located in dispersed and heterogeneous federated data systems. However, some business data problems are more complex than interactive analytics use cases, and are best broken down into a sequence of interdependent steps, a.k.a. a workflow. For these use cases, dedicated software is often required in order to schedule and manage these processes with a principled approach. In this session, we will look at how we can leverage Apache Airflow to orchestrate Trino queries into complex workflows that solve practical batch processing problems, all the while avoiding the use of repetitive, redundant data movement.

The scheduler is unarguably the most important component of an Airflow cluster. It is also the most complex and misunderstood by practitioners and administrators alike. In this talk, we will follow the path that a task instance takes to progress from creation to execution, and discuss the various configuration settings allowing users to tune the scheduler and executor to suit their workload patterns. Finally, we will dive deep into critical sections of the Airflow codebase and explore opportunities for optimization.

Cluster Policies are an advanced Airflow feature composed of a set of hooks that allow cluster administrators to implement checks and mutations against certain core Airflow constructs (DAGs, Tasks, Task Instances, Pods). In this talk, we will discuss how cluster administrators can leverage these functions in order to better govern the workloads that are running in their environments.

The task logging subsystem is one of most flexible, yet complex and misunderstood components of Airflow. In this talk, we will take a look at the various task log handlers that are part of the core Airflow distribution, and dig a bit deeper in the interfaces they implement and discuss how those can be used to roll your own logging implementation.