talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

5

Airflow Summit 2023 program

Filtering by: Cyber Security ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Airflow at Salesforce: Building a fully managed workflow orchestration system

2023-07-01
session

In this presentation, we discuss how we built a fully managed workflow orchestration system at Salesforce using Apache Airflow to facilitate dependable data lake infrastructure on the public cloud. We touch upon how we utilized kubernetes for increased scalability and resilience, as well as the most effective approaches for managing and scaling data pipelines. We will also talk about how we addressed data security and privacy, multitenancy, and interoperability with other internal systems. We discuss how we use this system to empower users with the ability to effortlessly build reliable pipelines that incorporate failure detection, alerting, and monitoring for deep insights through monitoring, removing the undifferentiated heavy lifting associated with running and managing their own orchestration engines. Lastly, we elaborate on how we integrated our in-house CI/CD pipelines to enable effective DAG and dependency management, further enhancing the system’s capabilities.

Better Support for Using Multiple Namespaces with KubernetesExecutor

2023-07-01
session

Airflow’s KubernetesExecutor has supported multi_namespace_mode for long time. This feature is great at allowing Airflow jobs to run in different namespaces on the same Kubernetes clusters for better isolation and easier management. However, this feature requires cluster-role for the Airflow scheduler, which can create security problems or be a blocker for some users. PR https://github.com/apache/airflow/pull/28047 , which will become available in Airflow 2.6.0, resolves this issue by allowing Airflow users to specify multi_namespace_mode_namespace_list when using multi_namespace_mode, so that no cluster-role is needed and user only needs to ensure the Scheduler has permissions to certain namespaces rather than all namespaces on the Kubernetes cluster. This talk aims to help you better understand KubernetesExecutor and how to set it up in a more secure manner.

Enabling Data Mesh by Moving from a Monolithic Airflow to Several Smaller Environments

2023-07-01
session

Kiwi.com started using Airflow in June 2016 as an orchestrator for several people in the company. The need for the tool grew and the monolithic instance was used by 30+ teams having 500+ DAGs active resulting in 3.5 million tasks/month successfully finished. At first, we moved to using a monolithic Airflow environment, but our needs quickly changed as we wanted to support a data mesh architecture within kiwi.com. By leveraging Astronomer on GCP, we were able to move from a monolithic Airflow environment to many smaller instances of Airflow. This talk will go into how to handle things like DAG dependencies, observability, and stakeholder management. Furthermore, we’ll talk about security, particularly how GCP’s workload identity helped us achieve a passwordless Airflow experience.

Open Source is Pretty Secure, Actually

2023-07-01
session

We’ve heard a lot in the last few years about insecurity in the open source software ecosystem, whether it be vulnerabilities, supply chain attacks or malware. Has open source become suddenly fraught with security problems? Or is it maybe, possibly… actually doing great? Let’s delve into the collaborative nature of our open-source ecosystems, and explore how transparency, peer review, and community have created a robust security posture. We’ll examine real-world examples, dispel myths, and reveal the inherent strengths of open source in fostering a secure and resilient software ecosystem.

Things to Consider When Building an Airflow Service

2023-07-01
session
Pete DeJoy (Astronomer) , Viraj Parekh

Data platform teams often find themselves in a situation where they have to provide Airflow as a service to downstream teams, as more users and use cases in their organization require an orchestrator. In these situations, it’s giving each team it’s own Airflow environment can unlock velocity and actually be lower overhead to maintain than a monolithic environment. This talk will be about things to keep in mind when building an Airflow service that supports several environments, persona of users, and use cases. Namely, we’ll discuss principles to keep in mind when balancing centralized control over the data platform with decentralized teams using Airflow in a way that they’ll need. This will include things around observability, developer productivity, security, and infrastructure. We’ll also talk about day 2 concerns around overheard, infrastructure maintenance, and other tradeoffs to consider.