Automating Healthcare Triage with Airflow and Large Language Models

2025-07-01 · Airflow Summit 2025

session

by Milcah Mbithi

AI/ML Airflow Data Quality

I will talk about how Apache Airflow is used in the healthcare sector with the integration of LLMs to enhance efficiency. Healthcare generates vast volumes of unstructured data daily, from clinical notes and patient intake forms to chatbot conversations and telehealth reports. Medical teams struggle to keep up, leading to delays in triage and missed critical symptoms. This session explores how Apache Airflow can be the backbone of an automated healthcare triage system powered by Large Language Models (LLMs). I’ll demonstrate how I designed and implemented an Airflow DAG orchestration pipeline that automates the ingestion, processing, and analysis of patient data from diverse sources in real-time. Airflow schedules and coordinates data extraction, preprocessing, LLM-based symptom extraction, and urgency classification, and finally routes actionable insights to healthcare professionals. The session will focus on the following; Managing complex workflows in healthcare data pipelines Safely integrating LLM inference calls into Airflow tasks Designing human-in-the-loop checkpoints for ethical AI usage Monitoring workflow health and data quality with Airflow.

Automating Threat Intelligence with Airflow, XDR, and LLMs using the MITRE ATT&CK Framework

2025-07-01 · Airflow Summit 2025

session

by Karan Alang

AI/ML Airflow Cyber Security

Security teams often face alert fatigue from massive volumes of raw log data. This session demonstrates how to combine Apache Airflow, Wazuh, and LLMs to build automated pipelines for smarter threat triage—grounded in the MITRE ATT&CK framework. We’ll explore how Airflow can orchestrate a full workflow: ingesting Wazuh alerts, using LLMs to summarize log events, matching behavior to ATT&CK tactics and techniques, and generating enriched incident summaries. With AI-powered interpretation layered on top of structured threat intelligence, teams can reduce manual effort while increasing context and clarity. You’ll learn how to build modular DAGs that automate: • Parsing and routing Wazuh alerts, • Querying LLMs for human-readable summaries, • Mapping IOCs to ATT&CK using vector similarity or prompt templates, • Outputting structured threat reports for analysts. The session includes a real-world example integrating open-source tools and public ATT&CK data, and will provide reusable components for rapid adoption. If you’re a SecOps engineer or ML practitioner in cybersecurity, this talk gives you a practical blueprint to deploy intelligent, scalable threat automation.

Designing Scalable Retrieval-Augmented Generation (RAG) Pipelines at SAP with Apache Airflow

2025-07-01 · Airflow Summit 2025

session

by Sagar Sharma

AI/ML Airflow API GenAI Kubernetes RAG SAP

At SAP Business AI, we’ve transformed Retrieval-Augmented Generation (RAG) pipelines into enterprise-grade powerhouses using Apache Airflow. Our Generative AI Foundations Team developed a cutting-edge system that effectively grounds Large Language Models (LLMs) with rich SAP enterprise data. Powering Joule for Consultants, our innovative AI copilot, this pipeline manages the seamless ingestion, sophisticated metadata enrichment, and efficient lifecycle management of over a million structured and unstructured documents. By leveraging Airflow’s Dynamic DAGs, TaskFlow API, XCom, and Kubernetes Event-Driven Autoscaling (KEDA), we achieved unprecedented scalability and flexibility. Join our session to discover actionable insights, innovative scaling strategies, and a forward-looking vision for Pipeline-as-a-Service, empowering seamless integration of customer-generated content into scalable AI workflows

Empowering Precision Healthcare with Apache Airflow-iKang Healthcare Group’s DataHub Journey

2025-07-01 · Airflow Summit 2025

session

by Yuan Luo , Huiliang Zhang

Airflow Data Lakehouse Data Streaming

iKang Healthcare Group, serving nearly 10 million patients annually, built a centralized healthcare data hub powered by Apache Airflow to support its large-scale, real-time clinical operations. The platform integrates batch and streaming data in a lakehouse architecture, orchestrating complex workflows from data ingestion (HL7/FHIR) to clinical decision support. Healthcare data’s inherent complexity—spanning structured lab results to unstructured clinical notes—requires dynamic, reliable orchestration. iKang uses Airflow’s DAGs, extensibility, and workflow-as-code capabilities to address challenges like multi-system coordination, semantic data linking, and fault-tolerant automation. iKang extended Airflow with cross-DAG event triggers, task priority weights, LLM-driven clinical text processing, and a visual drag-and-drop DAG builder for medical teams. These innovations improved diagnostic turnaround, patient safety, and cross-system workflow visibility. iKang’s work demonstrates Airflow’s power in transforming healthcare data infrastructure and advancing intelligent, scalable patient care.

From Oops to Secure Ops: Self-Hosted AI for Airflow Failure Diagnosis

2025-07-01 · Airflow Summit 2025

session

by Nathan Hadfield

AI/ML Airflow API Cloud Computing

Last year, ‘From Oops to Ops’ showed how AI-powered failure analysis could help diagnose why Airflow tasks fail. But do we really need large, expensive cloud-based AI models to answer simple diagnostic questions? Relying on external AI APIs introduces privacy risks, unpredictable costs, and latency, often without clear benefits for this use case. With the rise of distilled, open-source models, self-hosted failure analysis is now a practical alternative. This talk will explore how to deploy an AI service on infrastructure you control, compare cost, speed, and accuracy between OpenAI’s API and self-hosted models, and showcase a live demo of AI-powered task failure diagnosis using DeepSeek and Llama—running without external dependencies to keep data private and costs predictable.

How Pinterest Uses Ai to Empower Airflow Users for Troubleshooting

2025-07-01 · Airflow Summit 2025

session

by Rachel Sun

AI/ML Airflow

At Pinterest, there are over 10,000 DAGs supporting various use cases across different teams and roles. With this scale and diversity, user support has been an ongoing challenge to unlock productivity. As Airflow increasingly serves as a user interface to a variety of data and ML infrastructure behind the scenes, it’s common for issues from multiple areas to surface in Airflow, making triage and troubleshooting a challenge. In this session, we will discuss the scale of the problem we are facing, how we have addressed it so far, and how we are introducing LLM AI to help solve this problem.

LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

2025-07-01 · Airflow Summit 2025

session

by Naseem Shah

Airflow API CI/CD Data Engineering postgresql

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews. What You’ll Learn: Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability. Integrating LLM Analysis: Explore how we integrated LLM-based analysis into our pipeline. Learn how we designed the database, queries, and ingestion to Postgres. Extending Airflow UI: We developed a custom Airflow UI plugin that filters and visualizes DAG runs by customer, product, and marketplace, delivering clear insights for faster troubleshooting. Leveraging Airflow REST API: Discover how we leveraged the API to trigger DAGs on demand, elevating the UX by tracking mapped DAG progress and computing ETAs. CI/CD and Cost Management: Get practical tips for deploying DAGs with CI/CD.

Model Context Protocol with Airflow

2025-07-01 · Airflow Summit 2025

session

by Abhishek Bhakat , Sudarshan Chaudhari

AI/ML Airflow

In today’s data-driven world, effective workflow management and AI are crucial for success. However, there’s a notable gap between Airflow and AI. Our presentation offers a solution to close this gap. Proposing MCP (Model Context Protocol) server to act as a bridge. We’ll dive into two paths: AI-Augmented Airflow: Enhancing Airflow with AI to improve error handling, automate DAG generation, proactively detect issues, and optimize resource use. Airflow-Powered AI: Utilizing Airflow’s reliability to empower LLMs in executing complex tasks, orchestrating AI agents, and supporting decision-making with real-time data. Key takeaways: Understanding how to integrate AI insights directly into your workflow orchestration. Learning how MCP empowers AI with robust orchestration capabilities, offering full logging, monitoring, and auditability. Gaining insights into how to transform LLMS from a reactive responder to a proactive, intelligent, and reliable executor. Inviting you to explore how MCP can help workflow management, making AI-driven decisions more reliable and turning workflow systems into intelligent, autonomous agents.

Scaling Airflow at OpenAI

2025-07-01 · Airflow Summit 2025

session

by Howie Wang , Ping Zhang

Airflow Kubernetes

This talk shares how we scaled and hardened OpenAI’s Airflow deployment to orchestrate thousands of workflows on Kubernetes. We’ll cover key architecture choices, scaling strategies, and reliability improvements - along with practical lessons learned.

Vayu: The Airflow Copilot

2025-07-01 · Airflow Summit 2025

session

by Sanchit Sreekanth , Muhammed Irshad

AI/ML Airflow Git

Vayu is a conversational copilot for Apache Airflow, developed at Prevalent AI to help data engineers manage, troubleshoot, and fix pipelines using natural language. Deployments often fail silently due to misconfigurations, missing connections, or runtime issues impossible to identify in unit tests. Vayu tackles these via a troubleshooting agent that inspects logs, metrics, configs, and runtime state to find root causes and suggest fixes saving engineers significant troubleshooting time. It can also apply approved fixes to DAG code and commit them to your version control system. Key Capabilities: Troubleshooting Agent: Inspects logs, configs, variables, and connections to find root causes and suggest fixes. Pipeline Mechanic Agent: Suggests code-level fixes e.g., missing connections or bad imports and, once approved, commits them to version control. DAG Manager Agent: Understands DAG logic, suggests improvements, and can trigger DAGs conversationally. Architecture: Built with open-source tools including Google ADK as the orchestration layer and a custom Airflow MCP server based on the FastMCP framework. LLMs never access Airflow directly. The full codebase will be open-sourced.

What I Learned Building With LLMs

2025-06-30 · Mindstone Paris June AI Meetup

talk

AI/ML

A technical demo breaking down the process for building a product using AI with real-life learnings and insights.

#308 A Framework for GenAI App and Agent Development with Jerry Liu, CEO at LlamaIndex

2025-06-30 · DataFramed Listen

podcast_episode

by Jerry Liu (LlamaIndex) , Richie (DataCamp)

AI/ML GenAI RAG

The enterprise adoption of AI agents is accelerating, but significant challenges remain in making them truly reliable and effective. While coding assistants and customer service agents are already delivering value, more complex document-based workflows require sophisticated architectures and data processing capabilities. How do you design agent systems that can handle the complexity of enterprise documents with their tables, charts, and unstructured information? What's the right balance between general reasoning capabilities and constrained architectures for specific business tasks? Should you centralize your agent infrastructure or purchase vertical solutions for each department? The answers lie in understanding the fundamental trade-offs between flexibility, reliability, and the specific needs of your organization. Jerry Liu is the CEO and Co-founder at LlamaIndex, the AI agents platform for automating document workflows. Previously, he led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, and worked on recommendation systems at Quora. In the episode, Richie and Jerry explore the readiness of AI agents for enterprise use, the challenges developers face in building these agents, the importance of document processing and data structuring, the evolving landscape of AI agent frameworks like LlamaIndex, and much more. Links Mentioned in the Show: LlamaIndexLlamaIndex Production Ready Framework For LLM AgentsTutorial: Model Context Protocol (MCP)Connect with JerryCourse: Retrieval Augmented Generation (RAG) with LangChainRelated Episode: RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI & Adjunct Professor at Stanford UniversityRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

LLMs Evaluation

2025-06-26 · AI Meetup (Asana): Agentic AI, LLMs Evaluation

talk

by Menny Even-Danan (Asana)

In this session, we'll discuss how to evaluate LLMs response quality for Asana dev-tools.

Tech Talk: LLMs Evaluation

2025-06-26 · AI Meetup (Asana): Agentic AI, LLMs Evaluation

talk

by Menny Even-Danan (Asana)

In this session, we'll discuss how to evaluate LLMs response quality for Asana dev-tools.

Beyond the Base Model: Extending LLMs with RAG and Guardrails

2025-06-25 · 2025 June Event hosted by Monzo

talk

RAG

From 9× Safer Cars to Hallucination-Free LLMs: Inside Sama with Duncan Curtis SVP for GenAI & AI Product + Technology

2025-06-25 · Making Data Simple Listen

podcast_episode

by Duncan Curtis (Sama) , Al Martin (IBM)

AI/ML GenAI IBM

Send us a text From Elephant Butts to Ethical AI — Duncan Curtis on De-Risking GenAI at Sama Episode intro Duncan Curtis, SVP for GenAI & AI Product + Technology at Sama, has shipped everything from autonomous-vehicle platforms at Zoox to game-changing data products at Google. Today he leads a 160-person team that’s reinventing how training data is curated, labeled, and audited so enterprises can ship production-ready GenAI—without the lurking model risk. Sama’s newest release, Sama Automate, is already cutting annotation time by 40 percent while keeping quality above SLAs, and Duncan says they’re “aiming for a 10× improvement by 2025.” (aiuserconference.com, sama.com) If you want the inside track on AI ROI, ethical guardrails, and why A’s hire A’s (but B’s hire C’s!), lean in—this one’s for you. (And yes, we do get to elephant butts.) Timestamped roadmap 00:46 Meet Duncan Curtis03:51 The Duncan Brand05:52 Making Time for Yourself08:47 Autonomous Cars — 9× Safer12:21 Favorite Jobs13:24 Inside Sama14:39 Data & LLM Training16:04 De-Risking Models19:08 Ethical AI22:43 Stopping Hallucinations27:18 Data Labeling Deep-Dive31:56 Production-Ready GenAI33:44 AGI Horizons35:34 What Makes Sama Different36:31 Calculating AI ROI38:50 State of the LLMs44:48 Elephant Butts & Closing ThoughtsQuick links LinkedIn: https://www.linkedin.com/in/duncan-curtis/Sama: https://www.sama.com/Latest blog: “Sama Introduces New Data Automation Platform” (sama.com)Hear more: Duncan on “Human Guardrails in Generative AI” (DataCamp podcast) (datacamp.com)Hashtags

MakingDataSimple #AIProduct #GenAI #DataLabeling #EthicalAI #AIROI #AutonomousVehicles #Podcast

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Developer productivity with LLMs

2025-06-23 · A Night Of AI (8 speakers) Mini Tech Conference

Lightning Talk

AutoML via LLMs: Code as the Search Space

2025-06-20 · PyTorch Meetup #20

presentation

Episode 239: Claude-Poisoned Dev Sipping Rocket Fuel

2025-06-20 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA) , Ben Deane

AI/ML GenAI GitHub Python

In this episode, Conor recommends some articles on AI and LLMs. Link to Episode 239 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonShow Notes Date Generated: 2025-06-19 Date Released: 2025-06-20 The Real Python Podcast Episode 253My AI Skeptic Friends Are All Nuts - Thomas PtacekI Think I’m Done Thinking About genAI For Now - GlyphAI Changes Everything - Armin RonacherIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Building Neo4j-Powered Applications with LLMs

2025-06-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravindranatha Anthapu , Siddhant Agarwal

AI/ML Cloud Computing GCP GenAI Java Neo4j Python data data-engineering graph-databases

Dive into building applications that combine the power of Large Language Models (LLMs) with Neo4j knowledge graphs, Haystack, and Spring AI to deliver intelligent, data-driven recommendations and search outcomes. This book provides actionable insights and techniques to create scalable, robust solutions by leveraging the best-in-class frameworks and a real-world project-oriented approach. What this Book will help me do Understand how to use Neo4j to build knowledge graphs integrated with LLMs for enhanced data insights. Develop skills in creating intelligent search functionalities by combining Haystack and vector-based graph techniques. Learn to design and implement recommendation systems using LangChain4j and Spring AI frameworks. Acquire the ability to optimize graph data architectures for LLM-driven applications. Gain proficiency in deploying and managing applications on platforms like Google Cloud for scalability. Author(s) Ravindranatha Anthapu, a Principal Consultant at Neo4j, and Siddhant Agarwal, a Google Developer Expert in Generative AI, bring together their vast experience to offer practical implementations and cutting-edge techniques in this book. Their combined expertise in Neo4j, graph technology, and real-world AI applications makes them authoritative voices in the field. Who is it for? Designed for database developers and data scientists, this book caters to professionals aiming to leverage the transformational capabilities of knowledge graphs alongside LLMs. Readers should have a working knowledge of Python and Java as well as familiarity with Neo4j and the Cypher query language. If you're looking to enhance search or recommendation functionalities through state-of-the-art AI integrations, this book is for you.

talk-data.com

LLM

Activity Trend

Top Events

Top Speakers

Automating Healthcare Triage with Airflow and Large Language Models

Automating Threat Intelligence with Airflow, XDR, and LLMs using the MITRE ATT&CK Framework

Designing Scalable Retrieval-Augmented Generation (RAG) Pipelines at SAP with Apache Airflow

Empowering Precision Healthcare with Apache Airflow-iKang Healthcare Group’s DataHub Journey

From Oops to Secure Ops: Self-Hosted AI for Airflow Failure Diagnosis

How Pinterest Uses Ai to Empower Airflow Users for Troubleshooting

LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

Model Context Protocol with Airflow

Scaling Airflow at OpenAI

Vayu: The Airflow Copilot

What I Learned Building With LLMs

#308 A Framework for GenAI App and Agent Development with Jerry Liu, CEO at LlamaIndex

LLMs Evaluation

Tech Talk: LLMs Evaluation

Beyond the Base Model: Extending LLMs with RAG and Guardrails

From 9× Safer Cars to Hallucination-Free LLMs: Inside Sama with Duncan Curtis SVP for GenAI & AI Product + Technology

MakingDataSimple #AIProduct #GenAI #DataLabeling #EthicalAI #AIROI #AutonomousVehicles #Podcast

Developer productivity with LLMs

AutoML via LLMs: Code as the Search Space

Episode 239: Claude-Poisoned Dev Sipping Rocket Fuel

Building Neo4j-Powered Applications with LLMs