PyData Berlin 2025

Kubeflow pipelines meet uv

2025-09-03 Watch

talk

Fabrizio Damicelli

AI/ML Kubernetes

Kubeflow is a platform for building and deploying portable and scalable machine learning (ML) workflows using containers on Kubernetes-based systems.

We will code together a simple Kubeflow pipeline, show how to test it locally. As a bonus, we will explore one solution to avoid dependency hell using the modern dependency management tool uv.

Scraping urban mobility: analysis of Berlin carsharing

2025-09-03 Watch

talk

Florian König

AI/ML

Free-floating carsharing systems struggle to balance vehicle supply and demand, which often results in inefficient fleet distribution and reduced vehicle utilization. This talk explores how data scraping can be used to model vehicle demand and user behavior, enabling targeted incentives to encourage self-balancing vehicle flows.

Using information scraped from a major mobility provider over multiple months, the presentation provides spatiotemporal analyses and machine learning results to determine whether it's practically possible to offer low-friction discounts that lead to improved fleet balance.

Flying Beyond Keywords: Our Aviation Semantic Search Journey

2025-09-03 Watch

talk

Dat Tran (Priceloop) , Dennis Schmidt

AI/ML postgresql

In aviation, search isn’t simple—people use abbreviations, slang, and technical terms that make exact matching tricky. We started with just Postgres, aiming for something that worked. Over time, we upgraded: semantic embeddings, reranking. We tackled filter complexity, slow index builds, and embedding updates and much more. Along the way, we learned a lot about making AI search fast, accurate, and actually usable for our users. It’s been a journey—full of turbulence, but worth the landing.

Docling: Get your documents ready for gen AI

2025-09-03 Watch

talk

Michele Dolfi , Christoph Auer

AI/ML GenAI GitHub Linux Python

Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including usages with various generative AI frameworks and protocols (e.g. MCP).

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases

2025-09-03 Watch

talk

Cainã Max Couto da Silva

AI/ML RAG React SQL

This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this session, attendees will have created a powerful and extensible system they can adapt to their own data sources.

Edge of Intelligence: The State of AI in Browsers

2025-09-03 Watch

talk

Johannes Kolbe

AI/ML API

API calls suck! Okay, not all of them. But building your AI features reliant on third party APIs can bring a lot of trouble. In this talk you'll learn how to use web technologies to become more independent.

Maintainers of the Future: Code, Culture, and Everything After

2025-09-03 Watch

talk

Jessica Greene

AI/ML

How we sustain what we build — and why the future of tech depends on care, not only code.

The last five years have reshaped tech — through a pandemic, economic uncertainty, shifting politics, and the rapid rise of AI. While these changes have opened new opportunities, they’ve also exposed the limits — and harms — of a “move fast and break things” mindset.

From Manual to LLMs: Scaling Product Categorization

2025-09-02 Watch

talk

Ansgar Grüne , Giampaolo Casolla

AI/ML API LLM PySpark

How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust multi-step process which uses embeddings and LLMs via the OpenAI APIs. This talk offers data scientists and AI practitioners learnings and best practices for putting such a complex LLM-based system into production. This includes prompt development, balancing cost vs. accuracy via model selection, testing mult-case vs. single-case prompts, and saving costs by using the OpenAI Batch API and a smart early-stopping approach. We also describe our automation and monitoring in a PySpark environment.

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

2025-09-02 Watch

talk

Laura Dumont

AI/ML LLM NLP Python SQL Vector DB

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems

2025-09-02 Watch

talk

Oleh Kostromin , Iryna Kondrashchenko

AI/ML LLM

Evaluating large language models (LLMs) in real-world applications goes far beyond standard benchmarks. When LLMs are embedded in complex pipelines, choosing the right models, prompts, and parameters becomes an ongoing challenge.

In this talk, we will present a practical, human-in-the-loop evaluation framework that enables systematic improvement of LLM-powered systems based on expert feedback. By combining domain expert insights and automated evaluation methods, it is possible to iteratively refine these systems while building transparency and trust.

This talk will be valuable for anyone who wants to ensure their LLM applications can handle real-world complexity - not just perform well on generic benchmarks.

How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData

2025-09-02 Watch

talk

Alexander CS Hendorf

AI/ML LLM YAML

Using AI agents and automation, PyCon DE & PyData volunteers have transformed chaos into streamlined conference ops. From YAML files to LLM-powered assistants, they automate speaker logistics, FAQs, video processing, and more while keeping humans focused on creativity. This case study reveals practical lessons on making AI work in real-world scenarios: structured workflows, validation, and clear context beat hype. Live demos and open-source tools included.

Most AI Agents Are Useless. Let’s Fix That

2025-09-02 Watch

talk

Bilge Yücel

AI/ML LLM

AI agents are having a moment, but most of them are little more than fragile prototypes that break under pressure. Together, we’ll explore why so many agentic systems fail in practice, and how to fix that with real engineering principles. In this talk, you’ll learn how to build agents that are modular, observable, and ready for production. If you’re tired of LLM demos that don’t deliver, this talk is your blueprint for building agents that actually work.

Probably Fun: Games to teach Machine Learning

2025-09-02 Watch

talk

Dr. Kristian Rother , Shreyaasri Prakash

AI/ML LLM

In this tutorial, you will play several games that can be used to teach machine learning concepts. Each game can be played in big and small groups. Some involve hands- on material such as cards, some others involve electronic app. All games contain one or more concepts from Machine Learning.

As an outcome, you will take away multiple ideas that make complex topics more understandable – and enjoyable. By doing so, we would like to demonstrate that Machine Learning does not require computers, but the core ideas can be exemplified in a clear and memorable way without. We also would like to demonstrate that gamification is not limited to online quiz questions, but offers ways for learners to bond.

We will bring a set of carefully selected games that have been proven in a big classroom setting and contain useful abstractions of linear models, decision trees, LLMs and several other Machine Learning concepts. We also believe that it is probably fun to participate in this tutorial.

Training Specialized Language Models with Less Data: An End-to-End Practical Guide

2025-09-02 Watch

talk

Jacek Golebiowski

AI/ML LLM

Small Language Models (SLMs) offer an efficient and cost-effective alternative to LLMs—especially when latency, privacy, inference costs or deployment constraints matter. However, training them typically requires large labeled datasets and is time-consuming, even if it isn't your first rodeo.

This talk presents an end-to-end approach for curating high-quality synthetic data using LLMs to train domain-specific SLMs. Using a real-world use case, we’ll demonstrate how to reduce manual labeling time, cut costs, and maintain performance—making SLMs viable for production applications.

Whether you are a seasoned Machine Learning Engineer or a person just getting starting with building AI features, you will come away with the inspiration to build more performant, secure and environmentally-friendly AI systems.

Beyond the Black Box: Interpreting ML models with SHAP

2025-09-01 Watch

talk

Avik Basu

AI/ML

As machine learning models become more accurate and complex, explainability remains essential. Explainability helps not just with trust and transparency but also with generating actionable insights and guiding decision-making. One way of interpreting the model outputs is using SHapley Additive exPlanations (SHAP). In this talk, I will go through the concept of Shapley values and its mathematical intuition and then walk through a few real-world examples for different ML models. Attendees will gain a practical understanding of SHAP's strengths and limitations and how to use it to explain model predictions in their projects effectively.

AI-Ready Data in Action: Powering Smarter Agents

2025-09-01 Watch

talk

Chang She (LanceDB) , Violetta Mishechkina

AI/ML Lance

This hands-on workshop focuses on what AI engineers do most often: making data AI-ready and turning it into production-useful applications. Together with dltHub and LanceDB, you’ll walk through an end-to-end workflow: collecting and preparing real-world data with best practices, managing it in LanceDB, and powering AI applications with search, filters, hybrid retrieval, and lightweight agents. By the end, you’ll know how to move from raw data to functional, production-ready AI setups without the usual friction. We will touch upon multi-modal data and going to production with this end-to-end use case.

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

2025-09-01 Watch

talk

Christian Geier

AI/ML Kubernetes Python PyTorch

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

What’s Really Going On in Your Model? A Python Guide to Explainable AI

2025-09-01 Watch

talk

Yashasvi Misra (Pure Storage)

AI/ML Python

As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeholders, regulators, or just debugging unexpected results, the ability to explain your model is no longer optional , it's essential.

In this talk, we'll walk through practical tools in the Python ecosystem that help bring transparency to your models, including SHAP, LIME, and Captum. Through hands-on examples, you'll learn how to apply these libraries to real-world models from decision trees to deep neural networks and make sense of what's happening under the hood.

If you've ever struggled to explain your model’s output or justify its decisions, this session will give you a toolkit to build more trustworthy, interpretable systems without sacrificing performance.

Automating Content Creation with LLMs: A Journey from Manual to AI-Driven Excellence

2025-09-01 Watch

talk

Marco Vene

AI/ML API LLM

In the fast-paced realm of travel experiences, GetYourGuide encountered the challenge of maintaining consistent, high-quality content across its global marketplace. Manual content creation by suppliers often resulted in inconsistencies and errors, negatively impacting conversion rates. To address this, we leveraged large language models (LLMs) to automate content generation, ensuring uniformity and accuracy. This talk will explore our innovative approach, including the development of fine-tuned models for generating key text sections and the use of Function Calling GPT API for structured data. A pivotal aspect of our solution was the creation of an LLM evaluator to detect and correct hallucinations, thereby improving factual accuracy. Through A/B testing, we demonstrated that AI-driven content led to fewer defects and increased bookings. Attendees will gain insights into training data refinement, prompt engineering, and deploying AI at scale, offering valuable lessons for automating content creation across industries.

The EU AI Act: Unveiling Lesser-Known Aspects, Implementation Entities, and Exemptions

2025-09-01 Watch

talk

Adrin Jalali (scikit-learn and Fairlearn)

AI/ML

The EU AI Act is already partly in effect which prohibits certain AI systems. After going through the basics, we cover some of the less talked about aspects of the Act, introducing entities involved in its implementation and how many high risk government and law enforcement use cases are excluded!

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

2025-09-01 Watch

talk

Vinayak Nair

AI/ML Analytics Cloud Computing Cloud Storage Prefect

I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying third-party activity. We streamlined the entire pipeline from automated acquisition and cloud storage to preprocessing that ensures spatial, spectral, and temporal consistency. By leveraging Prefect for orchestration, Anyscale Ray for scalable processing, and the open source STAC standard for metadata indexing, we reduced processing times from days to near real-time. We addressed challenges like inconsistent metadata and diverse sensor types, building a flexible system capable of supporting large-scale geospatial analytics and AI workloads.

talk-data.com

Top Topics

Top Speakers

Kubeflow pipelines meet uv

Scraping urban mobility: analysis of Berlin carsharing

Flying Beyond Keywords: Our Aviation Semantic Search Journey

Docling: Get your documents ready for gen AI

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases

Edge of Intelligence: The State of AI in Browsers

Maintainers of the Future: Code, Culture, and Everything After

From Manual to LLMs: Scaling Product Categorization

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems

How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData

Most AI Agents Are Useless. Let’s Fix That

Probably Fun: Games to teach Machine Learning

Training Specialized Language Models with Less Data: An End-to-End Practical Guide

Beyond the Black Box: Interpreting ML models with SHAP

AI-Ready Data in Action: Powering Smarter Agents

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

What’s Really Going On in Your Model? A Python Guide to Explainable AI

Automating Content Creation with LLMs: A Journey from Manual to AI-Driven Excellence

The EU AI Act: Unveiling Lesser-Known Aspects, Implementation Entities, and Exemptions

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs