talk-data.com talk-data.com

Event

PyData Berlin 2025

2025-09-01 – 2025-09-03 PyData

Activities tracked

99

Sessions & talks

Showing 51–75 of 99 · Newest first

Search within this event →
Probably Fun: Games to teach Machine Learning

Probably Fun: Games to teach Machine Learning

2025-09-02 Watch
talk

In this tutorial, you will play several games that can be used to teach machine learning concepts. Each game can be played in big and small groups. Some involve hands- on material such as cards, some others involve electronic app. All games contain one or more concepts from Machine Learning.

As an outcome, you will take away multiple ideas that make complex topics more understandable – and enjoyable. By doing so, we would like to demonstrate that Machine Learning does not require computers, but the core ideas can be exemplified in a clear and memorable way without. We also would like to demonstrate that gamification is not limited to online quiz questions, but offers ways for learners to bond.

We will bring a set of carefully selected games that have been proven in a big classroom setting and contain useful abstractions of linear models, decision trees, LLMs and several other Machine Learning concepts. We also believe that it is probably fun to participate in this tutorial.

The Importance and Elegance of Polars Expressions

The Importance and Elegance of Polars Expressions

2025-09-02 Watch
talk

Polars is known for its speed, but its elegance comes from its use of expressions. In this talk, we’ll explore how Polars expressions work and why they are key to efficient and elegant data manipulation. Through real-world examples, you’ll learn how to create, expand, and combine expressions in Polars to wrangle data more effectively.

Training Specialized Language Models with Less Data: An End-to-End Practical Guide

Training Specialized Language Models with Less Data: An End-to-End Practical Guide

2025-09-02 Watch
talk

Small Language Models (SLMs) offer an efficient and cost-effective alternative to LLMs—especially when latency, privacy, inference costs or deployment constraints matter. However, training them typically requires large labeled datasets and is time-consuming, even if it isn't your first rodeo.

This talk presents an end-to-end approach for curating high-quality synthetic data using LLMs to train domain-specific SLMs. Using a real-world use case, we’ll demonstrate how to reduce manual labeling time, cut costs, and maintain performance—making SLMs viable for production applications.

Whether you are a seasoned Machine Learning Engineer or a person just getting starting with building AI features, you will come away with the inspiration to build more performant, secure and environmentally-friendly AI systems.

Coffee Break

2025-09-02
talk

Coffee Break

2025-09-02
talk

Coffee Break

2025-09-02
talk

Coffee Break

2025-09-02
talk
Narwhals: enabling universal dataframe support

Narwhals: enabling universal dataframe support

2025-09-02 Watch
talk

Ever tried passing a Polars Dataframe to a data science library and found that it...just works? No errors, no panics, no noticeable overhead, just...results? This is becoming increasingly common in 2025, yet only 2 years ago, it was mostly unheard of. So, what changed? A large part of the answer is: Narwhals.

Narwhals is a lightweight compatibility layer between dataframe libraries which lets your code work seamlessly across Polars, pandas, PySpark, DuckDB, and more! And it's not just a theoretical possibility: with ~30 million monthly downloads and set as a required dependency of Altair, Bokeh, Marimo, Plotly, Shiny, and more, it's clear that it's reshaping the data science landscape. By the end of the talk, you'll understand why writing generic dataframe code was such a headache (and why it isn't anymore), how Narwhals works and how its community operates, and how you can use it in your projects today. The talk will be technical yet accessible and light-hearted.

Opening notes

2025-09-02
talk

Registration & Coffee

2025-09-02
talk

Registration & Coffee

2025-09-02
talk

Registration & Coffee

2025-09-02
talk

Registration & Coffee

2025-09-02
talk
Building an A/B Testing Framework with NiceGUI

Building an A/B Testing Framework with NiceGUI

2025-09-01 Watch
talk

NiceGUI is a Python-based web UI framework that enables developers to build interactive web applications without using JavaScript. In this talk, I’ll share how my team used NiceGUI to create an internal A/B testing platform entirely in Python. I’ll discuss the key requirements for the platform, why we chose NiceGUI, and how it helped us design the UI, display results, and integrate with the backend. This session will demonstrate how NiceGUI simplifies development, reduces frontend complexity, and speeds up internal tool creation for Python developers.

Risk Budget Optimization for Causal Mix Models

Risk Budget Optimization for Causal Mix Models

2025-09-01 Watch
talk

Traditional budget planners chase the highest predicted return and hope for the best. Bayesian models take the opposite route: they quantify uncertainty first, then let us optimize budgets with that uncertainty fully on display. In this talk we’ll show how posterior distributions become a set of possible futures, and how risk‑aware loss functions convert those probabilities into spend decisions that balance upside with resilience. Whether you lead marketing, finance, or product, you’ll learn a principled workflow for turning probabilistic insight into capital allocation that’s both aggressive and defensible—no black‑box magic, just transparent Bayesian reasoning and disciplined risk management.

Beyond the Black Box: Interpreting ML models with SHAP

Beyond the Black Box: Interpreting ML models with SHAP

2025-09-01 Watch
talk

As machine learning models become more accurate and complex, explainability remains essential. Explainability helps not just with trust and transparency but also with generating actionable insights and guiding decision-making. One way of interpreting the model outputs is using SHapley Additive exPlanations (SHAP). In this talk, I will go through the concept of Shapley values and its mathematical intuition and then walk through a few real-world examples for different ML models. Attendees will gain a practical understanding of SHAP's strengths and limitations and how to use it to explain model predictions in their projects effectively.

Consumer Choice Models with PyMC Marketing

Consumer Choice Models with PyMC Marketing

2025-09-01 Watch
talk

Consumer choice models are an important part of product innovation and market strategy. In this talk we'll see how they can be used to learn about substitution goods and market shares in competitive markets using PyMC marketing's new consumer choice module.

AI-Ready Data in Action: Powering Smarter Agents

AI-Ready Data in Action: Powering Smarter Agents

2025-09-01 Watch
talk

This hands-on workshop focuses on what AI engineers do most often: making data AI-ready and turning it into production-useful applications. Together with dltHub and LanceDB, you’ll walk through an end-to-end workflow: collecting and preparing real-world data with best practices, managing it in LanceDB, and powering AI applications with search, filters, hybrid retrieval, and lightweight agents. By the end, you’ll know how to move from raw data to functional, production-ready AI setups without the usual friction. We will touch upon multi-modal data and going to production with this end-to-end use case.

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

2025-09-01 Watch
talk

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

Coffee Break

2025-09-01
talk

Coffee Break

2025-09-01
talk

Coffee Break

2025-09-01
talk

Coffee Break

2025-09-01
talk
Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

2025-09-01 Watch
talk

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

What’s Really Going On in Your Model? A Python Guide to Explainable AI

What’s Really Going On in Your Model? A Python Guide to Explainable AI

2025-09-01 Watch
talk
Yashasvi Misra (Pure Storage)

As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeholders, regulators, or just debugging unexpected results, the ability to explain your model is no longer optional , it's essential.

In this talk, we'll walk through practical tools in the Python ecosystem that help bring transparency to your models, including SHAP, LIME, and Captum. Through hands-on examples, you'll learn how to apply these libraries to real-world models from decision trees to deep neural networks and make sense of what's happening under the hood.

If you've ever struggled to explain your model’s output or justify its decisions, this session will give you a toolkit to build more trustworthy, interpretable systems without sacrificing performance.