talk-data.com talk-data.com

Event

PyData Amsterdam 2025

2025-09-24 – 2025-09-26 PyData

Activities tracked

13

Filtering by: LLM ×

Sessions & talks

Showing 1–13 of 13 · Newest first

Search within this event →
Techie vs Comic: The sequel

Techie vs Comic: The sequel

2025-09-26 Watch
talk

A data scientist by day and a standup comedian by night. This was how Arda described himself prior to his critically acclaimed performance about his two identities during PyData 2024, where they merged.

Now he doesn't even know.

After another year of stage performances, awkward LinkedIn interactions and mysterious cloud errors, Arda is back for another tale of absurdity. In this closing talk, he will illustrate the hilarity of his life as a data scientist in the age of LLMs and his non-existent comfort zone, proving good sequels can exist

Real-Time Context Engineering for LLMs

Real-Time Context Engineering for LLMs

2025-09-26 Watch
talk

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent. In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

2025-09-26 Watch
talk

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

2025-09-26 Watch
talk

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

Evaluating the alignment of LLMs to Dutch societal values

Evaluating the alignment of LLMs to Dutch societal values

2025-09-26 Watch
talk
LLM

The City of Amsterdam is researching the responsible adoption of Large Language Models (LLMs) by evaluating their performance, environmental impact, and alignment with human values. In this talk, we will share how we develop tailored benchmarks and a dedicated assessment platform to raise awareness and guide responsible implementation.

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

2025-09-26 Watch
talk

In this brave new world of vibe coding and YOLO-to-prod mentality, let’s take a step back and keep things grounded (pun intended). None of us would ever deploy a classical ML model to production without clearly defined metrics and proper evaluation, so let's talk about methodologies for measuring performance of LLM-powered chatbots. Think of retriever recall, answer relevancy, correctness, faithfulness and hallucination rates. With the wild west of metric standards still in full swing, I’ll guide you through the challenges of curating a synthetic test set, and selecting suitable metrics and open-source packages that help evaluating your use case. Everything is possible, from simple LLM-as-a-judge approaches like those inherent to many packages like MLFLow now up to complex multi-step quantification approaches with Ragas. If you work in the GenAI space or with LLM-powered chatbots, this session is for you! Prior or background knowledge is of advantage, but not required.

Scaling Trust: A practical guide on evaluating LLMs and Agents

2025-09-26
talk

Recently, the integration of Generative AI (GenAI) technologies into both our personal and professional lives has surged. In most organizations, the deployment of GenAI applications is on the rise, and this trend is expected to continue in the foreseeable future. Evaluating GenAI systems presents unique challenges not present in traditional ML. The main peculiarity is the absence of ground truth for textual metrics such as: text clarity, location extraction accuracy, factual accuracy and so on. Nevertheless the non-negligible model serving cost demands an even more thorough evaluation of the system to be deployed in production.

Defining the metric ground truth is a costly and time consuming process requiring human annotation. To address this, we are going to present how to evaluate LLM-based applications by leveraging LLMs themselves as evaluators. Moreover we are going to outline the complexities and evaluation methods for LLM-based Agents which operate with autonomy and present further evaluation challenges. Lastly, we will explore the critical role of evaluation in the GenAI lifecycle and outline the steps taken to integrate these processes seamlessly.

Whether you are an AI practitioner, user or enthusiast, join us to gain insights into the future of GenAI evaluation and its impact on enhancing application performance.

Model Context Protocol: Principles and Practice

Model Context Protocol: Principles and Practice

2025-09-26 Watch
talk

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

  • How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
  • Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
  • Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
  • Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

Untitled13.ipynb

Untitled13.ipynb

2025-09-26 Watch
talk

For well over a decade, Python notebooks revolutionized our field. They gave us so much creative freedom and dramatically lowered the entry barrier for newcomers. Yet despite all this ... it has been a decade! And the notebook is still in roughly the same form factor.

So what if we allow ourselves to rethink notebooks ... really rethink it! What features might we come up with? Can we make the notebook understand datasources? What about LLMs? Can we generate widgets on the fly? What if we make changes to Python itself?

This presentation will be a stream of demos that help paint a picture of what the future might hold. I will share my latest work in the anywidget/marimo ecosystem as well as some new hardware integrations.

The main theme that I will work towards: if you want better notebooks, reactive Python might very well be the future.

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

2025-09-25 Watch
talk

In this talk, we'll share how we're using computer vision to automate stock counting, right on the conveyor belt. We'll discuss the challenges we've faced with the hardware, software, and GenAI components, and we'll also review our own benchmark results for the various state-of-the-art models. Finally, we'll cover the practical aspects of GenAI deployment, including prompt optimization, preventing LLM "yapping," and creating a robust feedback loop for continuous improvement.

Event-Driven AI Agent Workflows with Dapr

2025-09-24
talk

As AI systems evolve, the need for robust infrastructure increases. Enter Dapr Agents: an open-source framework for creating production-grade AI agent systems. Built on top of the Dapr framework, Dapr Agents empowers developers to build intelligent agents capable of collaborating in complex workflows - leveraging Large Language Models (LLMs), durable state, built-in observability, and resilient execution patterns. This workshop will walk through the framework’s core components and through practical examples demonstrate how it solves real-world challenges.

Bridging the Gap: Building Robust, Tool-Integrated LLM Applications with the Model Context Protocol

2025-09-24
talk
LLM

Large Language Models (LLMs) are unlocking transformative capabilities — but integrating them into complex, real-world applications remains a major challenge. Simple prompting isn’t enough when dynamic interaction with tools, structured data, and live context is required. This workshop introduces the Model Context Protocol (MCP), an emerging open standard designed to simplify and standardise this integration. Aimed at forward-thinking developers and technologists, this hands-on session will equip participants with practical skills to build intelligent, modular, and extensible LLM-native applications using MCP.

Grounding LLMs on Solid Knowledge: Assessing and Improving Knowledge Graph Quality in GraphRAG Applications

2025-09-24
talk

Graph-based Retrieval-Augmented Generation (GraphRAG) enhances large language models (LLMs) by grounding their responses in structured knowledge graphs, offering more accurate, domain-specific, and explainable outputs. However, many of the graphs used in these pipelines are automatically generated or loosely assembled, and often lack the semantic structure, consistency, and clarity required for reliable grounding. The result is misleading retrieval, vague or incomplete answers, and hallucinations that are difficult to trace or fix.

This hands-on tutorial introduces a practical approach to evaluating and improving knowledge graph quality in GraphRAG applications. We’ll explore common failure patterns, walk through real-world examples, and share a reusable checklist of features that make a graph “AI-ready.” Participants will learn methods for identifying gaps, inconsistencies, and modeling issues that prevent knowledge graphs from effectively supporting LLMs, and apply simple fixes to improve grounding and retrieval performance in their own projects.