PyData Amsterdam 2025

Lunch break

2025-09-25

talk

Lunch break

2025-09-25

talk

Lunch break

2025-09-25

talk

Lunch break

2025-09-25

talk

Lunch break

2025-09-25

talk

Flip the Plan: Fast-Track Your AI/ML Model Integration with a Back-to-Front Implementation Strategy

2025-09-25 Watch

talk

Florenz Hollebrandse

AI/ML

"How quickly will you be able to get this model into production?" is a common question in analytical projects. Often, this is the first time anyone considers the complexities of deploying models within enterprise systems.

This talk introduces an approach to enhance the success rate of complex AI/ML integration projects while reducing time-to-market. Using examples from global banks J.P. Morgan and ING, we will demonstrate team organisation and engineering patterns to achieve this.

This talk is ideal for data scientists, engineers, and product managers interested in adopting an efficient Model Development Lifecycle (MDLC).

Formula 1 goes Bayesian: Time Series Decomposition with PyMC

2025-09-25 Watch

talk

Wesley Boelrijk

AI/ML Python

Forecasting time series can be messy, data is often missing, noisy, or full of structural changes like holidays, outliers, or evolving patterns. This talk shows how to build interpretable time series decomposition models using PyMC, a modern probabilistic programming library.

We’ll break time series into trend, seasonality, and noise components using engineered time features (e.g., Fourier and Radial Basis Functions). You’ll also learn how to model correlated series using hierarchical priors, letting multiple time series "learn from each other." As a case study, we’ll analyze Formula 1 lap time data to compare drivers and explore performance consistency using Bayesian posteriors.

This is a hands-on, code-first talk for data scientists, ML engineers, and researchers curious about Bayesian modeling (or Formula 1). Familiarity with Python and basic statistics is helpful, but no deep knowledge of Bayes is required.

From pixel to predictions: A journey through our CT image pipeline in pig breeding using POSIT

2025-09-25

talk

Lisette van der Zande

How do you turn a CT scan of a pig into usable data for large-scale genetic research? At Topigs Norsvin, we scan 10,000 male pigs each year using high-resolution CT imaging. This allows us to look inside the animals and assess carcass quality, muscle composition, and indicators of health. We use this data to inform selection decisions and improve the accuracy of our breeding program. In this talk, I'll walk you through the journey of CT data: from scan acquisition and processing to how we extract traits and integrate them into the breeding program. A key part of this process is POSIT, a lightweight project structure that helps us manage complexity, ensure reproducibility and scale our pipelines effectively. While the biological context is specific, the data challenges are familiar to any data professional.

GenAI governance in practice: patterns, pitfalls & strategies across tools and industries

2025-09-25

talk

Maarten de Ruiter

AI/ML GenAI

Governing generative AI systems presents unique challenges, particularly for teams dealing with diverse GenAI subdomains and rapidly changing technological landscapes. In this talk, Maarten de Ruiter, Data Scientist at Xomnia, shares practical insights drawn from real-world GenAI use-cases. He will highlight essential governance patterns, address common pitfalls, and provide actionable strategies for teams utilizing both open-source tools and commercial solutions. Attendees will gain concrete recommendations that work in practice, informed by successes (and failures!) across multiple industries

Actionable Techniques for Finding Performance Regressions

2025-09-25

talk

Jeroen Janssens , Thijs Nieuwdorp (VodafoneZiggo)

Bash Data Science Git Parquet Polars Python

Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.

Causal Inference Framework for incrementality : A Case Study at Booking to estimate incremental CLV due to App installs

2025-09-25

talk

Netesh , Nazlı Alagöz

GitHub Marketing Python

This talk dives into the challenge of measuring the causal impact of app installs on customer loyalty and value, a question at the heart of data-driven marketing. While randomized controlled trials are the gold standard, they’re rarely feasible in this context. Instead, we’ll explore how observational causal inference methods can be thoughtfully applied to estimate incremental value with careful consideration of confounding, selection, and measurement biases. This session is designed for data scientists, marketing analysts, and applied researchers with a working knowledge of statistics and causal inference concepts. We’ll keep the tone practical and informative, focusing on real-world challenges and solutions rather than heavy mathematical derivations.

Attendees will learn: * How to design robust observational studies for business impact * Strategies for covariate selection and bias mitigation * The use of multiple statistical and design-based causal inference approaches * Methods for validating and refuting causal claims in the absence of true randomization We’ll share actionable insights, code snippets, and a GitHub repository with example workflows so you can apply these techniques in your own organization. By the end of the talk, you’ll be equipped to design more transparent and credible causal studies-and make better decisions about where to invest your marketing dollars.

Requirements: A basic understanding of causal inference and Python is recommended. Materials and relevant links will be shared during the session

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

2025-09-25 Watch

talk

Sven Arends

GenAI LLM

In this talk, we'll share how we're using computer vision to automate stock counting, right on the conveyor belt. We'll discuss the challenges we've faced with the hardware, software, and GenAI components, and we'll also review our own benchmark results for the various state-of-the-art models. Finally, we'll cover the practical aspects of GenAI deployment, including prompt optimization, preventing LLM "yapping," and creating a robust feedback loop for continuous improvement.

Potato breeding using image analysis in a production setting

2025-09-25

talk

Rik Nuijten , Dick Abma

AWS DevOps Python Terraform

The scale-up company Solynta focuses on hybrid potato breeding, which helps achieve improvements in yield, disease resistance, and climate adaptation. Scientific innovation is part of our core business. Plant selections are highly data-driven, involving, for example, drone observations and genetic data. Minimal time-to-production for new ideas is essential, which is facilitated by our custom AWS devops platform. This platform focusses on automation and accessible data storage.

In this talk, we introduce how computer vision (YOLO and SAM modelling) enables monitoring traits of plants in the field, and how we operate these models. This further entails: • Our experience from training and evaluating models on drone images • Trade-offs selecting AWS services, Terraform modules and Python packages for automation and robustness • Our team setup that allows IT specialists and biologists to work together effectively

The talk will provide practical insights for both data scientists and DevOps engineers. The main takeaways are that object detection and segmentation from drone maps, at scale, are achievable for a small team. Furthermore, with the right approach, you can standardise a DevOps platform to let operations and developers work together.

Diversity Isn’t a Buzzword, It’s a Business Case

2025-09-25

talk

Başak Eskili

Thirteen years ago, I walked into my first programming class as one of a few women among seventy men and the numbers haven’t shifted much despite all the bootcamps and “Girls Who Code” posters promised. Today, women still make up under 30% of the tech workforce and even fewer in leadership. This talk blends personal experience and data to explore why that matters: companies with more women in leadership perform better, women are built to thrive under pressure, and when women are missing from tech, their perspectives are missing from the data too. Diversity isn’t charity or PR, it's how we build better systems.

Large-Scale Video Intelligence

2025-09-25 Watch

talk

Antonino Ingargiola , Irene Donato

AI/ML Python Vector DB

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system. We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.

Should Captain America Still Host Your Data? A Call for Open, EU-Based Data Platforms.

2025-09-25 Watch

talk

Manuel Spierenburg

Cloud Computing

When you store data in the cloud, do you know who really controls it? In an era of increasing geopolitical tension and growing awareness around digital sovereignty, Dutch research institutes have already begun repatriating sensitive data from US servers to Dutch-controlled storage. This talk explores the hidden risks behind common cloud choices, from legal access by foreign governments to the ethical implications of supporting politically active tech giants. We’ll look at what it means to own your data, how regional storage might not be enough, and what it takes to build an EU-hosted, open-source data platform stack. If you’re a data engineer, architect, or technology leader who cares about privacy, control, and sustainable infrastructure, this talk will equip you with the insight—and motivation—to make different choices.

Uncertainty Unleashed: Wrapping Your Predictions in Honesty with Conformal Prediction

2025-09-25

talk

Konstantinos Tsoumas

There are a lot of models working in production as you're reading this. Lots of them are giving uncalibrated outputs without being explicit on how much one can trust the result. Especially when it comes to imbalanced datasets.

More so, relying on biased estimates can lead to overly aggressive decisions. In this hands‑on talk, we’ll demystify conformal methods using MNIST—the world’s favorite handwritten‑digit playground (to make the talk more fun & interactive)- with two goals in mind: explain & prove what an unbiased guarantee is and how it can be calculated but also why should you care and why does it matter so much. Attendees may leave equipped with: uncertainty guarantee understanding in classification, identify common pitfalls that lead to biased uncertainty estimates, how to apply it (even in difficult contexts like imbalanced datasets - an example will be given).

Coffee break

2025-09-25

talk

Coffee break

2025-09-25

talk

Coffee break

2025-09-25

talk

Coffee break

2025-09-25

talk

Coffee break

2025-09-25

talk

Coffee break

2025-09-25

talk

The agentification of software (has a UX problem)

2025-09-25 Watch

talk

Demetrios Brinkmann

The "agentification" of software promises a future where we simply tell a machine what we want, and it handles the rest. We have all felt that magic when vibe coding. So why can't every interaction with machines be that magical?

Well, what if the dominant chat-based interface is a dead end?

This talk explores the significant UX challenges of agentification. We’ll discuss why chat is an over-promising and under-delivering medium, and how we're losing the rich, high-bandwidth context of human communication.

Opening notes

2025-09-25

talk

Opening notes

talk-data.com

Top Topics

Top Speakers

Lunch break

Lunch break

Lunch break

Lunch break

Lunch break

Flip the Plan: Fast-Track Your AI/ML Model Integration with a Back-to-Front Implementation Strategy

Formula 1 goes Bayesian: Time Series Decomposition with PyMC

From pixel to predictions: A journey through our CT image pipeline in pig breeding using POSIT

GenAI governance in practice: patterns, pitfalls & strategies across tools and industries

Actionable Techniques for Finding Performance Regressions

Causal Inference Framework for incrementality : A Case Study at Booking to estimate incremental CLV due to App installs

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

Potato breeding using image analysis in a production setting

Diversity Isn’t a Buzzword, It’s a Business Case

Large-Scale Video Intelligence

Should Captain America Still Host Your Data? A Call for Open, EU-Based Data Platforms.

Uncertainty Unleashed: Wrapping Your Predictions in Honesty with Conformal Prediction

Coffee break

Coffee break

Coffee break

Coffee break

Coffee break

Coffee break

The agentification of software (has a UX problem)

Opening notes