talk-data.com talk-data.com

Event

PyData Amsterdam 2025

2025-09-24 – 2025-09-26 PyData

Activities tracked

8

Filtering by: GenAI ×

Sessions & talks

Showing 1–8 of 8 · Newest first

Search within this event →
How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

2025-09-26 Watch
talk

In this brave new world of vibe coding and YOLO-to-prod mentality, let’s take a step back and keep things grounded (pun intended). None of us would ever deploy a classical ML model to production without clearly defined metrics and proper evaluation, so let's talk about methodologies for measuring performance of LLM-powered chatbots. Think of retriever recall, answer relevancy, correctness, faithfulness and hallucination rates. With the wild west of metric standards still in full swing, I’ll guide you through the challenges of curating a synthetic test set, and selecting suitable metrics and open-source packages that help evaluating your use case. Everything is possible, from simple LLM-as-a-judge approaches like those inherent to many packages like MLFLow now up to complex multi-step quantification approaches with Ragas. If you work in the GenAI space or with LLM-powered chatbots, this session is for you! Prior or background knowledge is of advantage, but not required.

Scaling Trust: A practical guide on evaluating LLMs and Agents

2025-09-26
talk

Recently, the integration of Generative AI (GenAI) technologies into both our personal and professional lives has surged. In most organizations, the deployment of GenAI applications is on the rise, and this trend is expected to continue in the foreseeable future. Evaluating GenAI systems presents unique challenges not present in traditional ML. The main peculiarity is the absence of ground truth for textual metrics such as: text clarity, location extraction accuracy, factual accuracy and so on. Nevertheless the non-negligible model serving cost demands an even more thorough evaluation of the system to be deployed in production.

Defining the metric ground truth is a costly and time consuming process requiring human annotation. To address this, we are going to present how to evaluate LLM-based applications by leveraging LLMs themselves as evaluators. Moreover we are going to outline the complexities and evaluation methods for LLM-based Agents which operate with autonomy and present further evaluation challenges. Lastly, we will explore the critical role of evaluation in the GenAI lifecycle and outline the steps taken to integrate these processes seamlessly.

Whether you are an AI practitioner, user or enthusiast, join us to gain insights into the future of GenAI evaluation and its impact on enhancing application performance.

Model Context Protocol: Principles and Practice

Model Context Protocol: Principles and Practice

2025-09-26 Watch
talk

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

  • How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
  • Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
  • Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
  • Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

Image processing, artificial intelligence, and autonomous systems

Image processing, artificial intelligence, and autonomous systems

2025-09-26 Watch
talk

In this talk, an overview of the field of image processing and the impact of artificial intelligence on this field are shown. Starting from the different tasks that can be performed with image processing, solutions using different AI technologies are shown, including the use of generative AI. Finally, the effect of AI for autonomous systems, and the challenges that are faced are discussed.

Context is King: Evaluating Long Context vs. RAG for Data Grounding

2025-09-25
talk

Grounding Large Language Models in your specific data is crucial, but notoriously challenging. Retrieval-Augmented Generation (RAG) is the common pattern, yet practical implementations are often brittle, suffering from poor retrieval, ineffective chunking, and context limitations, leading to inaccurate or irrelevant answers. The emergence of massive context windows (1M+ tokens) seems to offer a simpler path – just put all your data in the prompt! But does it truly solve the "needle in a haystack" problem, or introduce new challenges like prohibitive costs and information getting lost in the middle? This talk dives deep into the engineering realities. We'll dissect common RAG failure modes, explore techniques for building robust RAG systems (advanced retrieval, re-ranking, query transformations), and critically evaluate the practical viability, costs, and limitations of leveraging long context windows for complex data tasks in Python. Leave understanding the real trade-offs to make informed architectural decisions for building reliable, data-grounded GenAI applications.

Leading through the GenAI hype cycle: the good, the bad, and the ugly

2025-09-25
talk

Leaders operate across three dimensions: people, business, and technology. A generational shockwave like GenAI has large-scale and fast impact (be it true or perceived impact) on these three dimensions.

We leaders then face a sprint of interesting challenges like:

  • How to determine what value of this technology is currently underestimated vs overestimated, and how does this change in the future?
  • How do we contribute to the larger leadership team across different skillsets (sales, product, etc) in the company, being the subject matter experts on this topic?
  • How do we steer through the learning curve, for both the individual contributors in the team, and the wider company?

And few more similar challenges!

Join us for a nice panel discussion on this topic.

GenAI governance in practice: patterns, pitfalls & strategies across tools and industries

2025-09-25
talk

Governing generative AI systems presents unique challenges, particularly for teams dealing with diverse GenAI subdomains and rapidly changing technological landscapes. In this talk, Maarten de Ruiter, Data Scientist at Xomnia, shares practical insights drawn from real-world GenAI use-cases. He will highlight essential governance patterns, address common pitfalls, and provide actionable strategies for teams utilizing both open-source tools and commercial solutions. Attendees will gain concrete recommendations that work in practice, informed by successes (and failures!) across multiple industries

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

2025-09-25 Watch
talk

In this talk, we'll share how we're using computer vision to automate stock counting, right on the conveyor belt. We'll discuss the challenges we've faced with the hardware, software, and GenAI components, and we'll also review our own benchmark results for the various state-of-the-art models. Finally, we'll cover the practical aspects of GenAI deployment, including prompt optimization, preventing LLM "yapping," and creating a robust feedback loop for continuous improvement.