PyData Berlin 2025

See only what you are allowed to see: Fine-Grained Authorization

2025-09-03 Watch

talk

Maria Knorps

Python

Managing who can see or do what with your data is a fundamental challenge, especially as applications and data grow in complexity. Traditional role-based systems often lack the granularity needed for modern data platforms. Fine-Grained Authorization (FGA) addresses this by controlling access at the individual resource level. In this 90-minute hands-on tutorial, we will explore implementing FGA using OpenFGA, an open-source authorization engine inspired by Google's Zanzibar. Attendees will learn the core concepts of Relationship-Based Access Control (ReBAC) and get practical experience defining authorization models, writing relationship tuples, and performing authorization checks using the OpenFGA Python SDK. Bring your laptop ready to code to learn how to build secure and flexible permission systems for your data applications.

Docling: Get your documents ready for gen AI

2025-09-03 Watch

talk

Michele Dolfi , Christoph Auer

AI/ML GenAI GitHub Linux Python

Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including usages with various generative AI frameworks and protocols (e.g. MCP).

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

2025-09-02 Watch

talk

Orell Garten

Cloud Computing DuckDB Python

Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB.

In this talk, you'll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We'll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects.

If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.

Template-based web app and deployment pipeline at an enterprise-ready level on Azure

2025-09-02

talk

Johannes Schöck

Azure Azure DevOps Bicep DevOps Python

A practical deep-dive into Azure DevOps pipelines, the Azure CLI, and how to combine pipeline, bicep, and python templates to build a fully automated web app deployment system. Deploying a new proof of concept app within an actual enterprise environment never was faster.

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

2025-09-02 Watch

talk

Laura Dumont

AI/ML LLM NLP Python SQL Vector DB

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

Building Reactive Data Apps with Shinylive and WebAssembly

2025-09-02 Watch

talk

Christoph Scheuch

Cloud Computing Parquet Python

WebAssembly is reshaping how Python applications can be delivered - allowing fully interactive apps that run directly in the browser, without a traditional backend server. In this talk, I’ll demonstrate how to build reactive, data-driven web apps using Shinylive for Python, combining efficient local storage with Parquet and extending functionality with optional FastAPI cloud services. We’ll explore the benefits and limitations of this architecture, share practical design patterns, and discuss when browser-based Python is the right choice. Attendees will leave with hands-on techniques for creating modern, lightweight, and highly responsive Python data applications.

Building an A/B Testing Framework with NiceGUI

2025-09-01 Watch

talk

Wessel van de Goor

JavaScript Python

NiceGUI is a Python-based web UI framework that enables developers to build interactive web applications without using JavaScript. In this talk, I’ll share how my team used NiceGUI to create an internal A/B testing platform entirely in Python. I’ll discuss the key requirements for the platform, why we chose NiceGUI, and how it helped us design the UI, display results, and integrate with the backend. This session will demonstrate how NiceGUI simplifies development, reduces frontend complexity, and speeds up internal tool creation for Python developers.

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

2025-09-01 Watch

talk

Christian Geier

AI/ML Kubernetes Python PyTorch

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

2025-09-01 Watch

talk

Gergely Daroczi

Cloud Computing Data Collection LLM Python

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

What’s Really Going On in Your Model? A Python Guide to Explainable AI

2025-09-01 Watch

talk

Yashasvi Misra (Pure Storage)

AI/ML Python

As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeholders, regulators, or just debugging unexpected results, the ability to explain your model is no longer optional , it's essential.

In this talk, we'll walk through practical tools in the Python ecosystem that help bring transparency to your models, including SHAP, LIME, and Captum. Through hands-on examples, you'll learn how to apply these libraries to real-world models from decision trees to deep neural networks and make sense of what's happening under the hood.

If you've ever struggled to explain your model’s output or justify its decisions, this session will give you a toolkit to build more trustworthy, interpretable systems without sacrificing performance.

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

2025-09-01 Watch

talk

Mehdi Ouazza

Analytics DuckDB ETL/ELT Pandas Polars Python

Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll learn how to use DuckDB as an in-process analytical database: building data pipelines, caching datasets, and running complex queries with SQL—all without leaving Python. We’ll cover common use cases like ETL, lightweight data orchestration, and interactive analytics workflows. You’ll leave with a solid mental model for using DuckDB effectively as the “SQLite for analytics.”

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

2025-09-01 Watch

talk

Konrad Richter

API Python

At GetYourGuide, we transformed experimentation from a centralized, closed system into a democratized, self-service platform accessible to all analysts, engineers, and product teams. In this talk, we'll share our journey to empower individuals across the company to define metrics, create dimensions, and easily extend statistical methods. We'll discuss how we built a Python-based Analyzer toolkit enabling standardized, reusable calculations, and how our experimentation platform provides ad-hoc analytical capabilities through a flexible API. Attendees will gain practical insights into creating scalable, maintainable, and user-friendly experimentation infrastructure, along with access to our open-source sequential testing implementation.

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python

2025-09-01 Watch

talk

Yaseen Esmaeelpour

Funnel Python

Optimizing user funnels is a common task for data analysts and data scientists. Funnels are not always linear in the real world. often, the next step depends on earlier responses or actions. This results in complex funnels that can be tricky to analyze. I’ll introduce an open-source Python library I developed that analyzes and visualizes non-linear, conditional funnels by utilizing Graphviz and Streamlit. It calculates conversion rates, drop-offs, time spent on each step, and highlights bottlenecks by color. Attendees will learn about how to quickly explore complex user journeys and generate insightful funnel data.

talk-data.com

Top Topics

Top Speakers

See only what you are allowed to see: Fine-Grained Authorization

Docling: Get your documents ready for gen AI

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

Template-based web app and deployment pipeline at an enterprise-ready level on Azure

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

Building Reactive Data Apps with Shinylive and WebAssembly

Building an A/B Testing Framework with NiceGUI

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

What’s Really Going On in Your Model? A Python Guide to Explainable AI

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python