talk-data.com talk-data.com

Event

Airflow Summit 2024

2024-07-01 Airflow Summit Visit website ↗

Activities tracked

7

Airflow Summit 2024 program

Filtering by: GenAI ×

Sessions & talks

Showing 1–7 of 7 · Newest first

Search within this event →

10 years of Airflow: history, insights, and looking forward

2024-07-01
session

10 years after its creation, Airflow is stronger than ever: in last year’s Airflow survey, 81% of users said Airflow is important or very important to their business, 87% said their Airflow usage has grown over time, and 92% said they would recommend Airflow. In this panel discussion, we’ll celebrate a decade of Airflow and delve into how it became the highly recommended industry standard it is today, including history, pivotal moments, and the role of the community. Our panel of seasoned experts will also talk about where Airflow is going next, including future use cases like generative AI and the highly anticipated Airflow 3.0. Don’t miss this insightful exploration into one of the most influential tools in the data landscape.

Airflow, Spark, and LLMs: Turbocharging MLOps at ASAPP

2024-07-01
session

This talk will explore ASAPP’s use of Apache Airflow to streamline and optimize our machine learning operations (MLOps). Key highlights include: Integrating with our custom Spark solution for achieving speedup, efficiency, and cost gains for generative AI transcription, summarization and intent categorization pipelines Different design patterns of integrating with efficient LLM servers - like TGI/vllm/tensor-RT for Summarization pipelines with/without Spark. An overview of batched LLM inference using Airflow as opposed to real time inference outside of it [Tentative] Possible extension of this scaffolding to Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF) for fine-tuning LLMs, using Airflow as the orchestrator. Additionally, the talk will cover ASAPP’s MLOps journey with Airflow over the past few years, including an overview of our cloud infrastructure, various data backends, and sources. The primary focus will be on the machine learning workflows at ASAPP, rather than the data workflows, providing a detailed look at how Airflow enhances our MLOps processes.

Customizing LLMs: Leveraging Technology to tailor GenAI using Airflow

2024-07-01
session

Laurel provides an AI-driven timekeeping solution tailored for accounting and legal firms, automating timesheet creation by capturing digital work activities. This session highlights two notable AI projects: UTBMS Code Prediction: Leveraging small language models, this system builds new embeddings to predict work codes for legal bills with high accuracy. More details are available in our case study: https://www.laurel.ai/resources-post/enhancing-legal-and-accounting-workflows-with-ai-insights-into-work-code-prediction . Bill Creation and Narrative Generation: Utilizing Retrieval-Augmented Generation (RAG), this approach transforms users’ digital activities into fully billable entries. Additionally, we will discuss how we use Airflow for model management in these AI projects: Daily Model Retraining: We retrain our models daily Model (Re)deployment: Our Airflow DAG evaluates model performance, redeploying it if improvements are detected Cost Management: To avoid high costs associated with querying large language models frequently, our DAG utilizes RAG to efficiently summarize daily activities into a billable timesheet at day’s end.

Gen AI using Airflow 3: A vision for Airflow RAGs

2024-07-01
session

Gen AI has taken the computing world by storm. As Enterprises and Startups have started to experiment with LLM applications, it has become clear that providing the right context to these LLM applications is critical. This process known as Retrieval augmented generation (RAG) relies on adding custom data to the large language model, so that the efficacy of the response can be improved. Processing custom data and integrating with Enterprise applications is a strength of Apache Airflow. This talk goes into details about a vision to enhance Apache Airflow to more intuitively support RAG, with additional capabilities and patterns. Specifically, these include the following Support for unstructured data sources such as Text, but also extending to Image, Audio, Video, and Custom sensor data LLM model invocation, including both external model services through APIs and local models using container invocation. Automatic Index Refreshing with a focus on unstructured data lifecycle management to avoid cumbersome and expensive index creation on Vector databases Templates for hallucination reduction via testing and scoping strategies

How the Airflow Community Productionizes Generative AI

2024-07-01
session
Pete DeJoy (Astronomer)

Every data team out there is being asked from their business stakeholders about Generative AI. Taking LLM centric workloads to production is not a trivial task. At the foundational level, there are a set of challenges around data delivery, data quality, and data ingestion that mirror traditional data engineering problems. Once you’re past those, there’s a set of challenges related to the underlying use case you’re trying to solve. Thankfully, because of how Airflow was already being used at these companies for data engineering and MLOps use cases, it has become the defacto orchestration layer behind many GenAI use cases for startups and Fortune 500s. This talk will be a tour of various methods, best practices, and considerations used in the Airflow community when taking GenAI use cases to production. We’ll focus on 4 primary use cases; RAG, fine tuning, resource management, and batch inference and take a walk through patterns different members in the community have used to productionize this new, exciting technology.

Unleash the Power of AI: Streamlining Airflow DAG Development with AI-Driven Automation

2024-07-01
session

Nowadays, conversational AI is no longer exclusive to large enterprises. It has become more accessible and affordable, opening up new possibilities and business opportunities. In this session, discover how you can leverage Generative AI as your AI pair programmer to suggest DAG code and recommend entire functions in real-time, directly from your editor. Visualize how to harness the power of ML, trained on billions of lines of code, to transform natural language prompts into coding suggestions. Seamlessly cycle through lines of code, complete function suggestions, and choose to accept, reject, or edit them. Witness firsthand how Generative AI provides recommendations based on the project’s context and style conventions. The objective is to equip you with techniques that allow you to spend less time on boilerplate and repetitive code patterns, and more time on what truly matters: building exceptional orchestration software.

Unlocking FMOps/LLMOps with Airflow: A guide to operationalizing and managing Large Language Models

2024-07-01
session
Parnab Basak (Amazon Web Services)

In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of transforming businesses. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this session, we delve into the operationalization of generative AI applications using MLOps principles, leading to the introduction of foundation model operations (FMOps) or LLM operations using Apache Airflow. We further zoom into aspects of expected people and process mindsets, new techniques for model selection and evaluation, data privacy, and model deployment. Additionally, know how you can use the prescriptive features of Apache Airflow to aid your operational journey. Whether you are building using out of the box models (open-source or proprietary), creating new foundation models from scratch, or fine-tuning an existing model, with the structured approaches described you can effectively integrate LLMs into your operations, enhancing efficiency and productivity without causing disruptions in the cloud or on-premises.