talk-data.com talk-data.com

Event

PyData Amsterdam 2025

2025-09-24 – 2025-09-26 PyData

Activities tracked

119

Sessions & talks

Showing 51–75 of 119 · Newest first

Search within this event →

Social Event

2025-09-25
talk

Social Event

2025-09-25
talk

Social Event

2025-09-25
talk

Social Event

2025-09-25
talk

Social Event

2025-09-25
talk

Closing notes

2025-09-25
talk

Closing notes

Ethics is Not a Feature: Rethinking AI from the Ground Up

Ethics is Not a Feature: Rethinking AI from the Ground Up

2025-09-25 Watch
talk

Ethics is often treated like a product feature—something to be added at the end, polished for compliance, or marketed for trust. But what if that mindset is exactly what’s holding us back? In this keynote, we’ll challenge the idea that ethics is optional or external to the development process. We’ll explore how ethical blind spots in AI systems—from biased models to black-box decisions to unsustainable compute—aren’t just philosophical dilemmas, but human failures with real-world consequences. You’ll learn how to spot ethical risks before they become failures, and discover practical tools and mindsets to build AI that earns trust—without compromising on innovation. From responsible data practices to transparency techniques and green AI strategies, we’ll connect the dots between values and code. This isn’t just a lecture—it’s a call to rethink how we build the future of AI—together.

Snack break

2025-09-25
talk

Snack break

2025-09-25
talk

Snack break

2025-09-25
talk

Snack break

2025-09-25
talk

Snack break

2025-09-25
talk

Snack break

2025-09-25
talk

Measure twice, deploy once: Evaluation of retrieval systems

2025-09-25
talk
RAG

Improving retrieval systems—especially in RAG pipelines—requires a clear understanding of what’s working and what isn’t. The only scalable way to do that is through meaningful metrics. In this talk, we share insights from building a platform-agnostic search and retrieval product, and how we balance performance against cost. Bigger models often give better results… but at what price? We explain how to assess what’s “good enough” and why the choice of benchmark really matters.

Quiet on Set: Building an On-Air Sign with Open Source Technologies

Quiet on Set: Building an On-Air Sign with Open Source Technologies

2025-09-25 Watch
talk
Danica Fine (Snowflake)

Using a Raspberry Pi and a powerful trio of open-source technologies—Apache Kafka, Apache Flink, and Apache Iceberg—learn how to build a custom on-air sign to signal when you're on a call and discover how this same scaffolding can be scaled for millions of users.

The Gentle Monorepo: Ship Faster and Collaborate Better

The Gentle Monorepo: Ship Faster and Collaborate Better

2025-09-25 Watch
talk

Monorepos promise faster development and smoother cross-team collaboration, but they often seem intimidating, requiring major tooling, buy-in, and process changes. This talk shows how Dexter gradually introduced a Python monorepo by combining a few lightweight tools with a pragmatic, trust-based approach to adoption. The result is that we can effectively reuse components across our various energy forecasting and trade optimization products. We iterate quicker on bringing our research to production, which benefits our customers and supports the renewable energy transition. After this talk, you’ll walk away with a practical blueprint for introducing a monorepo in your context, without requiring heavy up-front work.

Continuous monitoring of model drift in the financial sector

Continuous monitoring of model drift in the financial sector

2025-09-25 Watch
talk

In today’s financial sector, the continuous accuracy and reliability of machine learning models are crucial for operational efficiency and effective risk management. With the rise of MLOps (Machine Learning Operations), automating monitoring mechanisms has become essential to ensure model performance and compliance with regulations. This presentation introduces a method for continuous monitoring of model drift, highlighting the benefits of automation within the MLOps framework. This topic is particularly interesting because it addresses a common challenge in maintaining model performance over time and demonstrates a practical solution that has been successfully implemented in the bank.

This talk is aimed at data scientists, machine learning engineers, and MLOps practitioners who are interested in automating the monitoring of machine learning models. Attendees will be guided on how to continuous monitor model drift within the MLOps framework. They will understand the benefits of automation in this context, and gain insights into MLOps best practices. A basic understanding of MLOps principles, and statistical techniques for model evaluation will be helpful but not strictly needed.

The presentation will be an informative talk with a focus on the design and implementation. It will include some mathematical concepts but will primarily be demonstrating real-world applications and best practices. At the end we encourage you to actively monitor model drift and automate your monitoring processes to enhance model accuracy, scalability, and compliance in your organizations.

Microlog: Explain Your Python Applications with Logs, Graphs, and AI

2025-09-25
talk

Microlog is a lightweight continuous profiler and logger for Python that helps developers understand their applications through interactive visualizations and AI-powered insights. With extremely low overhead and a 100% Python stack, it makes it easy to trace performance issues, debug unexpected behavior, and gain visibility into production systems.

No labels? No problem! - Hunting Fraudsters with Minimal Labels and Maximum ML

2025-09-25
talk

Card testing is one of the largest growing fraud problems within the payments landscape, with fraudsters launching millions of attempts globally each month. These attacks can cost companies thousands of euros in lost revenue and lead to the distribution of private card details. Detecting this type of fraud is extremely difficult without confirmed labels to train standard supervised ML classifiers. In this talk, we’ll describe how we built a production-ready ML model that now processes hundreds of transactions per second and share the key take-aways from our journey.

What Works: Practical Lessons in Applying Privacy-Enhancing Technologies (PET) in Data Science

What Works: Practical Lessons in Applying Privacy-Enhancing Technologies (PET) in Data Science

2025-09-25 Watch
talk

Privacy-Enhancing Technologies (PETs) promise to bridge the gap between data utility and privacy — but how do they perform in practice? In this talk, we’ll share real-world insights from our hands-on experience testing and implementing leading PET solutions across various data science use cases. We explored tools such as differential privacy libraries, homomorphic encryption frameworks, federated learning, multi-party computation, etc. Some lived up to their promise — others revealed critical limitations. You’ll walk away with a clear understanding of which PET solutions work best for which types of data and analysis, what trade-offs to expect, and how to set realistic goals when integrating PETs into your workflows. This session is ideal for data professionals and decision-makers who are navigating privacy risks while still wanting to innovate responsibly.

Context is King: Evaluating Long Context vs. RAG for Data Grounding

2025-09-25
talk

Grounding Large Language Models in your specific data is crucial, but notoriously challenging. Retrieval-Augmented Generation (RAG) is the common pattern, yet practical implementations are often brittle, suffering from poor retrieval, ineffective chunking, and context limitations, leading to inaccurate or irrelevant answers. The emergence of massive context windows (1M+ tokens) seems to offer a simpler path – just put all your data in the prompt! But does it truly solve the "needle in a haystack" problem, or introduce new challenges like prohibitive costs and information getting lost in the middle? This talk dives deep into the engineering realities. We'll dissect common RAG failure modes, explore techniques for building robust RAG systems (advanced retrieval, re-ranking, query transformations), and critically evaluate the practical viability, costs, and limitations of leveraging long context windows for complex data tasks in Python. Leave understanding the real trade-offs to make informed architectural decisions for building reliable, data-grounded GenAI applications.

Designing tests for ML libraries – lessons from the wild

Designing tests for ML libraries – lessons from the wild

2025-09-25 Watch
talk

In this talk, we will cover how to write effective test cases for machine learning (ML) libraries that are used by hundreds of thousands of users on a regular basis. Tests, despite their well-established need for trust and foolproofing, often get less prioritized. Later, this can wreak havoc on massive codebases, with a high likelihood of introducing breaking changes and other unpleasant situations. This talk deals with our approach to testing our ML libraries, which serve a wide user base. We will cover a wide variety of topics, including the mindset and the necessity of minimal-yet-sufficient testing, all the way up to sharing some practical examples of end-to-end test suites.

Leading through the GenAI hype cycle: the good, the bad, and the ugly

2025-09-25
talk

Leaders operate across three dimensions: people, business, and technology. A generational shockwave like GenAI has large-scale and fast impact (be it true or perceived impact) on these three dimensions.

We leaders then face a sprint of interesting challenges like:

  • How to determine what value of this technology is currently underestimated vs overestimated, and how does this change in the future?
  • How do we contribute to the larger leadership team across different skillsets (sales, product, etc) in the company, being the subject matter experts on this topic?
  • How do we steer through the learning curve, for both the individual contributors in the team, and the wider company?

And few more similar challenges!

Join us for a nice panel discussion on this topic.

Streamlining data pipeline development with Ordeq

2025-09-25
talk

In this talk, we will introduce Ordeq, a cutting-edge data pipeline development framework used by data engineers, scientists and analysts across ING. Ordeq helps you modularise pipeline logic and abstract IO, elevating projects from proof-of-concepts to maintainable production-level applications. We will demonstrate how Ordeq integrates seamlessly with popular data processing tools like Spark, Polars, Matplotlib, DSPy, and orchestration tools such as Airflow. Additionally, we showcase how you can leverage Ordeq on public cloud offering like GCP. Ordeq has 0 dependencies and is available under MIT license.

Lunch break

2025-09-25
talk