Real-Time Context Engineering for LLMs

2025-09-26 Watch

talk

Manu Joseph

AI/ML LLM Python React

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent. In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

2025-09-26 Watch

talk

Oleh Kostromin , Iryna Kondrashchenko

AI/ML LLM

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

Searching for My Next Chart

2025-09-26 Watch

talk

Muhammad Chenariyan Nakhaee

AI/ML DataViz RAG

Abstract

As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.

In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.

Composable Pipelines for ML: Automating Feature Engineering with Hopsworks’ Brewer

2025-09-26

talk

Javier de la Rúa Martínez (Hopsworks)

AI/ML AWS Glue

Operationalizing ML isn’t just about models — it’s about moving and engineering data. At Hopsworks, we built a composable AI pipeline builder (Brewer) based on two principles: Tasks and Data Sources. This lets users define workflows that automatically analyse, clean, create and update feature groups, without glue code or brittle scheduling logic.

In this talk, we’ll show how Brewer drives the automation of feature engineering, enabling reproducible, declarative pipelines that respond to changes in upstream data. We’ll explore how this fits into broader ML workflows, from ingestion to feature materialization, and how it integrates with warehouses, streams, and file-based systems.

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

2025-09-26 Watch

talk

Maria Bader

AI/ML GenAI LLM

In this brave new world of vibe coding and YOLO-to-prod mentality, let’s take a step back and keep things grounded (pun intended). None of us would ever deploy a classical ML model to production without clearly defined metrics and proper evaluation, so let's talk about methodologies for measuring performance of LLM-powered chatbots. Think of retriever recall, answer relevancy, correctness, faithfulness and hallucination rates. With the wild west of metric standards still in full swing, I’ll guide you through the challenges of curating a synthetic test set, and selecting suitable metrics and open-source packages that help evaluating your use case. Everything is possible, from simple LLM-as-a-judge approaches like those inherent to many packages like MLFLow now up to complex multi-step quantification approaches with Ragas. If you work in the GenAI space or with LLM-powered chatbots, this session is for you! Prior or background knowledge is of advantage, but not required.

Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL

2025-09-26

talk

Miguel Leite , Vitalii Zhebrakovskyi

AI/ML Flink Python Spark

Building ML features at scale shouldn’t require every ML Scientist to become an expert in Spark or Flink. At Adyen, the Feature Platform team built a Python-based DSL that lets data scientists define features declaratively — while automatically generating the necessary batch or real-time pipelines behind the scenes.

Detection of Unattended Objects in Public Spaces using AI

2025-09-26

talk

Evertjan Peer

AI/ML Cyber Security

This talk presents an end-to-end solution for detecting unattended objects in public transport hubs to enhance social security. The project, developed in a three-week challenge, focuses on proactively identifying unattended items using existing camera infrastructure. We will cover the entire pipeline, from data anonymization and preprocessing to building a data labeling platform, object detection with YOLO, and tracking objects over time. The presentation will also discuss the evaluation of the system.

Scaling Trust: A practical guide on evaluating LLMs and Agents

2025-09-26

talk

George Chouliaras , Antonio Castelli

AI/ML GenAI LLM

Recently, the integration of Generative AI (GenAI) technologies into both our personal and professional lives has surged. In most organizations, the deployment of GenAI applications is on the rise, and this trend is expected to continue in the foreseeable future. Evaluating GenAI systems presents unique challenges not present in traditional ML. The main peculiarity is the absence of ground truth for textual metrics such as: text clarity, location extraction accuracy, factual accuracy and so on. Nevertheless the non-negligible model serving cost demands an even more thorough evaluation of the system to be deployed in production.

Defining the metric ground truth is a costly and time consuming process requiring human annotation. To address this, we are going to present how to evaluate LLM-based applications by leveraging LLMs themselves as evaluators. Moreover we are going to outline the complexities and evaluation methods for LLM-based Agents which operate with autonomy and present further evaluation challenges. Lastly, we will explore the critical role of evaluation in the GenAI lifecycle and outline the steps taken to integrate these processes seamlessly.

Whether you are an AI practitioner, user or enthusiast, join us to gain insights into the future of GenAI evaluation and its impact on enhancing application performance.

Optimize the Right Thing: Cost-Sensitive Classification in Practice

2025-09-26 Watch

talk

Shimanto Rahman

AI/ML Python Scikit-learn

Not all mistakes in machine learning are equal—a false negative in fraud detection or medical diagnosis can be far costlier than a false positive. Cost-sensitive learning helps navigate these trade-offs by incorporating error costs into the training process, leading to smarter decision-making. This talk introduces Empulse, an open-source Python package that brings cost-sensitive learning into scikit-learn. Attendees will learn why standard models fall short in cost-sensitive scenarios and how to build better classifiers with Scikit-Learn and Empulse.

Image processing, artificial intelligence, and autonomous systems

2025-09-26 Watch

talk

Judith Dijk

AI/ML GenAI

In this talk, an overview of the field of image processing and the impact of artificial intelligence on this field are shown. Starting from the different tasks that can be performed with image processing, solutions using different AI technologies are shown, including the use of generative AI. Finally, the effect of AI for autonomous systems, and the challenges that are faced are discussed.

Ethics is Not a Feature: Rethinking AI from the Ground Up

2025-09-25 Watch

talk

Dr. Maria Börner

AI/ML

Ethics is often treated like a product feature—something to be added at the end, polished for compliance, or marketed for trust. But what if that mindset is exactly what’s holding us back? In this keynote, we’ll challenge the idea that ethics is optional or external to the development process. We’ll explore how ethical blind spots in AI systems—from biased models to black-box decisions to unsustainable compute—aren’t just philosophical dilemmas, but human failures with real-world consequences. You’ll learn how to spot ethical risks before they become failures, and discover practical tools and mindsets to build AI that earns trust—without compromising on innovation. From responsible data practices to transparency techniques and green AI strategies, we’ll connect the dots between values and code. This isn’t just a lecture—it’s a call to rethink how we build the future of AI—together.

Continuous monitoring of model drift in the financial sector

2025-09-25 Watch

talk

Agustin Iniguez , Denis Gaitan

AI/ML MLOps

In today’s financial sector, the continuous accuracy and reliability of machine learning models are crucial for operational efficiency and effective risk management. With the rise of MLOps (Machine Learning Operations), automating monitoring mechanisms has become essential to ensure model performance and compliance with regulations. This presentation introduces a method for continuous monitoring of model drift, highlighting the benefits of automation within the MLOps framework. This topic is particularly interesting because it addresses a common challenge in maintaining model performance over time and demonstrates a practical solution that has been successfully implemented in the bank.

This talk is aimed at data scientists, machine learning engineers, and MLOps practitioners who are interested in automating the monitoring of machine learning models. Attendees will be guided on how to continuous monitor model drift within the MLOps framework. They will understand the benefits of automation in this context, and gain insights into MLOps best practices. A basic understanding of MLOps principles, and statistical techniques for model evaluation will be helpful but not strictly needed.

The presentation will be an informative talk with a focus on the design and implementation. It will include some mathematical concepts but will primarily be demonstrating real-world applications and best practices. At the end we encourage you to actively monitor model drift and automate your monitoring processes to enhance model accuracy, scalability, and compliance in your organizations.

Microlog: Explain Your Python Applications with Logs, Graphs, and AI

2025-09-25

talk

Chris Laffra

AI/ML Python

Microlog is a lightweight continuous profiler and logger for Python that helps developers understand their applications through interactive visualizations and AI-powered insights. With extremely low overhead and a 100% Python stack, it makes it easy to trace performance issues, debug unexpected behavior, and gain visibility into production systems.

No labels? No problem! - Hunting Fraudsters with Minimal Labels and Maximum ML

2025-09-25

talk

Jaap Stefels , Itzel Belderbos

AI/ML

Card testing is one of the largest growing fraud problems within the payments landscape, with fraudsters launching millions of attempts globally each month. These attacks can cost companies thousands of euros in lost revenue and lead to the distribution of private card details. Detecting this type of fraud is extremely difficult without confirmed labels to train standard supervised ML classifiers. In this talk, we’ll describe how we built a production-ready ML model that now processes hundreds of transactions per second and share the key take-aways from our journey.

Designing tests for ML libraries – lessons from the wild

2025-09-25 Watch

talk

Sayak Paul , Benjamin Bossan

AI/ML

In this talk, we will cover how to write effective test cases for machine learning (ML) libraries that are used by hundreds of thousands of users on a regular basis. Tests, despite their well-established need for trust and foolproofing, often get less prioritized. Later, this can wreak havoc on massive codebases, with a high likelihood of introducing breaking changes and other unpleasant situations. This talk deals with our approach to testing our ML libraries, which serve a wide user base. We will cover a wide variety of topics, including the mindset and the necessity of minimal-yet-sufficient testing, all the way up to sharing some practical examples of end-to-end test suites.

Flip the Plan: Fast-Track Your AI/ML Model Integration with a Back-to-Front Implementation Strategy

2025-09-25 Watch

talk

Florenz Hollebrandse

AI/ML

"How quickly will you be able to get this model into production?" is a common question in analytical projects. Often, this is the first time anyone considers the complexities of deploying models within enterprise systems.

This talk introduces an approach to enhance the success rate of complex AI/ML integration projects while reducing time-to-market. Using examples from global banks J.P. Morgan and ING, we will demonstrate team organisation and engineering patterns to achieve this.

This talk is ideal for data scientists, engineers, and product managers interested in adopting an efficient Model Development Lifecycle (MDLC).

Formula 1 goes Bayesian: Time Series Decomposition with PyMC

2025-09-25 Watch

talk

Wesley Boelrijk

AI/ML Python

Forecasting time series can be messy, data is often missing, noisy, or full of structural changes like holidays, outliers, or evolving patterns. This talk shows how to build interpretable time series decomposition models using PyMC, a modern probabilistic programming library.

We’ll break time series into trend, seasonality, and noise components using engineered time features (e.g., Fourier and Radial Basis Functions). You’ll also learn how to model correlated series using hierarchical priors, letting multiple time series "learn from each other." As a case study, we’ll analyze Formula 1 lap time data to compare drivers and explore performance consistency using Bayesian posteriors.

This is a hands-on, code-first talk for data scientists, ML engineers, and researchers curious about Bayesian modeling (or Formula 1). Familiarity with Python and basic statistics is helpful, but no deep knowledge of Bayes is required.

GenAI governance in practice: patterns, pitfalls & strategies across tools and industries

2025-09-25

talk

Maarten de Ruiter

AI/ML GenAI

Governing generative AI systems presents unique challenges, particularly for teams dealing with diverse GenAI subdomains and rapidly changing technological landscapes. In this talk, Maarten de Ruiter, Data Scientist at Xomnia, shares practical insights drawn from real-world GenAI use-cases. He will highlight essential governance patterns, address common pitfalls, and provide actionable strategies for teams utilizing both open-source tools and commercial solutions. Attendees will gain concrete recommendations that work in practice, informed by successes (and failures!) across multiple industries

Large-Scale Video Intelligence

2025-09-25 Watch

talk

Antonino Ingargiola , Irene Donato

AI/ML Python Vector DB

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system. We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.

Event-Driven AI Agent Workflows with Dapr

2025-09-24

talk

Marc Duiker , Dana Arsovska

AI/ML GitHub LLM

As AI systems evolve, the need for robust infrastructure increases. Enter Dapr Agents: an open-source framework for creating production-grade AI agent systems. Built on top of the Dapr framework, Dapr Agents empowers developers to build intelligent agents capable of collaborating in complex workflows - leveraging Large Language Models (LLMs), durable state, built-in observability, and resilient execution patterns. This workshop will walk through the framework’s core components and through practical examples demonstrate how it solves real-world challenges.

Grounding LLMs on Solid Knowledge: Assessing and Improving Knowledge Graph Quality in GraphRAG Applications

2025-09-24

talk

Panos Alexopoulos

AI/ML LLM RAG

Graph-based Retrieval-Augmented Generation (GraphRAG) enhances large language models (LLMs) by grounding their responses in structured knowledge graphs, offering more accurate, domain-specific, and explainable outputs. However, many of the graphs used in these pipelines are automatically generated or loosely assembled, and often lack the semantic structure, consistency, and clarity required for reliable grounding. The result is misleading retrieval, vague or incomplete answers, and hallucinations that are difficult to trace or fix.

This hands-on tutorial introduces a practical approach to evaluating and improving knowledge graph quality in GraphRAG applications. We’ll explore common failure patterns, walk through real-world examples, and share a reusable checklist of features that make a graph “AI-ready.” Participants will learn methods for identifying gaps, inconsistencies, and modeling issues that prevent knowledge graphs from effectively supporting LLMs, and apply simple fixes to improve grounding and retrieval performance in their own projects.

Building AI Agents With Observability Tooling in PyCharm

2025-09-24

talk

Lenar Sharipov , Yaroslav Sokolov

AI/ML

As AI-powered agents and workflows grow in complexity, understanding their internal behavior becomes critical. In this hands-on workshop, you’ll build an agent and explore how observability tooling in PyCharm can help you trace, inspect, and debug its behavior at every stage – without having to leave the IDE.

Meet Docling: The “Pandas” for document AI

2025-09-24

talk

Mingxuan Zhao , Panos Vagenas

AI/ML Pandas

A workshop session to show you the basics on how to use Docling to enhance document ingestion in your AI workflow.

talk-data.com

PyData Amsterdam 2025

Top Topics

Top Speakers

Real-Time Context Engineering for LLMs

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

Searching for My Next Chart

Abstract

Composable Pipelines for ML: Automating Feature Engineering with Hopsworks’ Brewer

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL

Detection of Unattended Objects in Public Spaces using AI

Scaling Trust: A practical guide on evaluating LLMs and Agents

Optimize the Right Thing: Cost-Sensitive Classification in Practice

Image processing, artificial intelligence, and autonomous systems

Ethics is Not a Feature: Rethinking AI from the Ground Up

Continuous monitoring of model drift in the financial sector

Microlog: Explain Your Python Applications with Logs, Graphs, and AI

No labels? No problem! - Hunting Fraudsters with Minimal Labels and Maximum ML

Designing tests for ML libraries – lessons from the wild

Flip the Plan: Fast-Track Your AI/ML Model Integration with a Back-to-Front Implementation Strategy

Formula 1 goes Bayesian: Time Series Decomposition with PyMC

GenAI governance in practice: patterns, pitfalls & strategies across tools and industries

Large-Scale Video Intelligence

Event-Driven AI Agent Workflows with Dapr

Grounding LLMs on Solid Knowledge: Assessing and Improving Knowledge Graph Quality in GraphRAG Applications

Building AI Agents With Observability Tooling in PyCharm

Meet Docling: The “Pandas” for document AI