talk-data.com talk-data.com

Event

PyData Amsterdam 2025

2025-09-24 – 2025-09-26 PyData

Activities tracked

119

Sessions & talks

Showing 1–25 of 119 · Newest first

Search within this event →

Social drinks

2025-09-26
talk
Techie vs Comic: The sequel

Techie vs Comic: The sequel

2025-09-26 Watch
talk

A data scientist by day and a standup comedian by night. This was how Arda described himself prior to his critically acclaimed performance about his two identities during PyData 2024, where they merged.

Now he doesn't even know.

After another year of stage performances, awkward LinkedIn interactions and mysterious cloud errors, Arda is back for another tale of absurdity. In this closing talk, he will illustrate the hilarity of his life as a data scientist in the age of LLMs and his non-existent comfort zone, proving good sequels can exist

Conference closing notes

2025-09-26
talk

Conference closing notes

Minus Three Tier: Data Architecture Turned Upside Down

Minus Three Tier: Data Architecture Turned Upside Down

2025-09-26 Watch
talk

Every data architecture diagram out there makes it abundantly clear who's in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost.

But there is another way: Disconnect of storage and compute enables localization of query processing closer to people, leading to much snappier responses, natural scaling with client-side query processing, and much reduced cost.

In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user's needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions.

Snack break

2025-09-26
talk

Snack break

2025-09-26
talk

Snack break

2025-09-26
talk

Snack break

2025-09-26
talk
Real-Time Context Engineering for LLMs

Real-Time Context Engineering for LLMs

2025-09-26 Watch
talk

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent. In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

Resource Monitoring and Optimization with Metaflow

Resource Monitoring and Optimization with Metaflow

2025-09-26 Watch
talk

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

2025-09-26 Watch
talk

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

Help! There Are Humans in My Data!

Help! There Are Humans in My Data!

2025-09-26 Watch
talk

Good quality data is the basis for high quality models and valuable data insights. But isn't it annoying how often your data is riddled with those pesky humans? Human involvement in data creation often introduces errors, misunderstandings, and biases that can compromise data integrity. This talk will explore how human factors influence the data creation process and what we as data professionals can do to account for this in our data interpretation and usage.

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

2025-09-26 Watch
talk

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

Orchestrating success: How Vinted standardizes large-scale, decentralized data pipelines

2025-09-26
talk

At Vinted, Europe’s largest second-hand marketplace, over 20 decentralized data teams generate, transform, and build products on petabytes of data. Each team utilizes their own tools, workflows, and expertise. Coordinating data pipeline creation across such diverse teams presents significant challenges. These include complex inter-team dependencies, inconsistent scheduling solutions, and rapidly evolving requirements.

This talk is aimed at data engineers, platform engineers, and technical leads with experience in workflow orchestration and will demonstrate how we empower teams at Vinted to define data pipelines quickly and reliably. We will present our user-friendly abstraction layer built on top of Apache Airflow, enhanced by a Python code generator. This abstraction simplifies upgrades and migrations, removes scheduler complexity, and supports Vinted’s rapid growth. Attendees will learn how Python abstractions and code generation can standardize pipeline development across diverse teams, reduce operational complexity, and enable greater flexibility and control in large-scale data organizations. Through practical lessons and real-world examples of our abstraction interface, we will offer insights into designing scheduler-agnostic architectures for successful data pipeline orchestration.

Evaluating the alignment of LLMs to Dutch societal values

Evaluating the alignment of LLMs to Dutch societal values

2025-09-26 Watch
talk
LLM

The City of Amsterdam is researching the responsible adoption of Large Language Models (LLMs) by evaluating their performance, environmental impact, and alignment with human values. In this talk, we will share how we develop tailored benchmarks and a dedicated assessment platform to raise awareness and guide responsible implementation.

Kickstart Your Probabilistic Forecasting with Level Set and Quantile Regression Forests

Kickstart Your Probabilistic Forecasting with Level Set and Quantile Regression Forests

2025-09-26 Watch
talk

Probabilistic forecasting is essential, but choosing the right method is tricky. This talk introduces two lesser-known models — Level Set Forecaster and Quantile Regression Forest — that help you kickstart probabilistic forecasting without unnecessary complexity.

Searching for My Next Chart

Searching for My Next Chart

2025-09-26 Watch
talk

Abstract

As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.

In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.

Open source sprints - PyIceberg & PyMC

2025-09-26
talk

Also this year, at our 10 year anniversary edition of PyData Amsterdam, we’ll host open source sprints! ️ Our open source sprints this year will be 2 sessions in parallel, with leading open source contributors Fokko Driesprong and Rob Zinkov of the respective packages PyIceberg and PyMC.

Lunch

2025-09-26
talk

Lunch

2025-09-26
talk

Lunch

2025-09-26
talk

Lunch

2025-09-26
talk

Lunch

2025-09-26
talk

Lunch

2025-09-26
talk