talk-data.com talk-data.com

J

Speaker

Julian LaNeve

6

talks

Chief Technology Officer at Astronomer Astronomer

CTO at Astronomer.

Bio from: NYC Airflow Rooftop Happy Hour ft. PMC Member Jarek Potiuk!

Frequent Collaborators

Filter by Event / Source

Talks & appearances

6 activities · Newest first

Search activities →

As the demand for data products grows, data engineering teams face mounting pressure to deliver more and even faster, often becoming bottlenecks. Astro IDE changes the game. Astro IDE is an AI-powered code editor built for Apache Airflow. It helps data teams go from idea to production in minutes—generating production-ready DAGs, enabling in-browser testing, and integrating directly with Git. In this session, see how Astro IDE accelerates DAG creation, debugging, and deployment so data engineering teams can deliver more, 10x faster.

As the demand for data products grows, data engineering teams face mounting pressure to deliver more and even faster, often becoming bottlenecks. Astro IDE changes the game. Astro IDE is an AI-powered code editor built for Apache Airflow. It helps data teams go from idea to production in minutes—generating production-ready DAGs, enabling in-browser testing, and integrating directly with Git. In this session, see how Astro IDE accelerates DAG creation, debugging, and deployment so data engineering teams can deliver more, 10x faster.

Data engineers have shifted from delivering data for internal analytics applications to customer-facing data products. And with that shift comes a whole new level of operational rigor necessary to instill trust and confidence in the data. How do you hold data pipelines to the same standards as traditional software applications? Can you apply principles learned from the field of SRE to the world of data? In this talk, we’ll explore how we’ve seen this evolve in Astronomer’s customer base and highlight best practices learned from the most critical data product applications we’ve seen. We’ll hear from Astronomer’s own data team as they went through the transformation from analytics to data products. And we’ll showcase a new product we’re building to help data teams around the world solve exactly this problem!

There are 3 certainties in life: death, taxes, and data pipelines failing. Pipelines may fail for a number of reasons: you may run out of memory, your credentials may expire, an upstream data source may not be reliable, etc. But there are patterns we can learn from! Join us as we walk through an analysis we’ve done on a massive dataset of Airflow failure logs. We’ll show how we used natural language processing and dimensionality reduction methods to explore the latent space of Airflow task failures in order to cluster, visualize, and understand failures. We’ll conclude the talk by walking through mitigation methods for common task failure reasons, and walk through how we can use Airflow to build an MLOps platform to turn this one-time analysis into a reliable, recurring activity.

Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions. This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data. In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.