talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (10 results)

See all 10 →

Companies (1 result)

Activities & events

Title & Speakers Event

We're super excited for our last meetup of 2025 - just before the holidays start, we're back in Amsterdam, and this time at AI House Amsterdam powered by Prosus, on Wednesday 10 December!

This edition will be extra interesting, since it will be about Profitability AI: Build it right. Make it fast. Keep it cheap.

AI projects don’t become business-critical overnight. They move through stages: from spark-of-an-idea prototypes to hardened, scalable systems that drive real revenue. In this meetup edition, we invite industry leaders to share the technical journeys their AI projects went through before becoming part of their core business. This evening is all about what it actually takes to build profitable AI: not just using the latest models, but creating systems that are efficient, scalable, operationally reliable, and deliver measurable value. You’ll hear how teams navigate the messy middle, from architecture choices to optimization strategies, to transform AI from a cool demo into a cost-effective production engine. Expect an honest look at real-world trade-offs, engineering challenges, and the solutions that made their AI both powerful and economical.

Excited as well?! We'd love to welcome you for a evening full of knowledge sharing, demos and of course great conversations, networking, and above all a fun evening with the PyData community!

Agenda

  • 18:00 - 19:00: Walk-in with drinks & food
  • 19:00 - 19:45: Talk 1 - Scaling Personalized Push Notifications by Floris Fok
  • 19:45 - 20:00: Short break
  • 20:00 - 20:45: Talk 2 - LLM distillation explained: Make smarter, cheaper, and deployable AI for enterprises by Mashrur Haider
  • 20:45 - 21:30: Networking + drinks & bites

Talk 1 : Scaling Personalized Push Notifications by Floris Fok

This talk explores how we productionize personalized push notifications at scale—moving from proof-of-concept to serving 130 billion tokens per day to nearly half of Brazil's population. We'll share the journey from traditional CRM systems to personalized-powered notifications, covering the data processing pipeline, key architectural decisions, and operational challenges. Learn the trade-offs we navigated between latency and personalization depth, how we achieved a cost per order under 10 cents, and practical insights into productionizing foundation models for commerce.

Floris Fok is a Senior AI Engineer at Prosus Group, specializing in Generative AI. He helped develop Europe's second foundational model, Climate GPT, and has over 4 years of NLP experience spanning technologies from BERT to DeepSeek. Floris played a role in the development of Toqan and has been utilizing it since its early days.

Talk 2 : LLM distillation explained: Make smarter, cheaper, and deployable AI for enterprises by Mashrur Haider

Running large LLMs in production is expensive, but often unnecessary. In this masterclass, Mashrur Haider breaks down how distillation, a popular post-training technique, can cut inference costs by up to 70% while maintaining enterprise-grade performance. You’ll learn how distillation compares to quantization and fine-tuning, seeing real benchmarks. Key takeaways: Distillation 101: How it works and why enterprises use it. Benchmarks: Cost savings without accuracy trade-offs. Workflow: From data prep to deployment on Nebius Token Factory. Scaling: Running distilled models in production with compliance and reliability.

Mashrur Haider is a Tech PM at Nebius AI Studio with a deep healthcare background (BSc Genetics, Stony Brook; MSc Bioinformatics & ML, University of Amsterdam). He’s researched at Netherlands Cancer Institute, worked in Advanced R&D at Philips IGT Systems, and operated a VC-backed techbio startup. At Nebius Token Factory, he translates real customer needs into scalable, user-friendly products aimed at model customisation and dedicated inference.

Directions The venue for this meetup is AI House Amsterdam located at the Prosus Global Headquarters (Gustav Mahlerplein 5, 1082 MS Amsterdam). AI House Amsterdam is conveniently located next to the train station Amsterdam Zuid (3 minutes walking).

Profitability AI: Build it right. Make it fast. Keep it cheap.
Event PyData Amsterdam 2025 2025-09-26
Social drinks 2025-09-26 · 15:20
Techie vs Comic: The sequel 2025-09-26 · 15:20

A data scientist by day and a standup comedian by night. This was how Arda described himself prior to his critically acclaimed performance about his two identities during PyData 2024, where they merged.

Now he doesn't even know.

After another year of stage performances, awkward LinkedIn interactions and mysterious cloud errors, Arda is back for another tale of absurdity. In this closing talk, he will illustrate the hilarity of his life as a data scientist in the age of LLMs and his non-existent comfort zone, proving good sequels can exist

Cloud Computing LLM
Conference closing notes 2025-09-26 · 15:00

Conference closing notes

Every data architecture diagram out there makes it abundantly clear who's in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost.

But there is another way: Disconnect of storage and compute enables localization of query processing closer to people, leading to much snappier responses, natural scaling with client-side query processing, and much reduced cost.

In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user's needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions.

API Data Engineering Data Lakehouse DWH
Lightning Talks 2025-09-26 · 13:35

-

Snack break 2025-09-26 · 13:30
Snack break 2025-09-26 · 13:30
Snack break 2025-09-26 · 13:30
Snack break 2025-09-26 · 13:30

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

Cloud Computing Data Science

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent. In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

AI/ML LLM Python React

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

LLM NLP

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

AI/ML LLM
Rodrigo Loredo – Lead Analytics Engineer at Vinted , Oscar Ligthart – Senior Data Engineer at Vinted

At Vinted, Europe’s largest second-hand marketplace, over 20 decentralized data teams generate, transform, and build products on petabytes of data. Each team utilizes their own tools, workflows, and expertise. Coordinating data pipeline creation across such diverse teams presents significant challenges. These include complex inter-team dependencies, inconsistent scheduling solutions, and rapidly evolving requirements.

This talk is aimed at data engineers, platform engineers, and technical leads with experience in workflow orchestration and will demonstrate how we empower teams at Vinted to define data pipelines quickly and reliably. We will present our user-friendly abstraction layer built on top of Apache Airflow, enhanced by a Python code generator. This abstraction simplifies upgrades and migrations, removes scheduler complexity, and supports Vinted’s rapid growth. Attendees will learn how Python abstractions and code generation can standardize pipeline development across diverse teams, reduce operational complexity, and enable greater flexibility and control in large-scale data organizations. Through practical lessons and real-world examples of our abstraction interface, we will offer insights into designing scheduler-agnostic architectures for successful data pipeline orchestration.

Airflow Python

Good quality data is the basis for high quality models and valuable data insights. But isn't it annoying how often your data is riddled with those pesky humans? Human involvement in data creation often introduces errors, misunderstandings, and biases that can compromise data integrity. This talk will explore how human factors influence the data creation process and what we as data professionals can do to account for this in our data interpretation and usage.

Searching for My Next Chart 2025-09-26 · 11:25

Abstract

As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.

In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.

AI/ML DataViz RAG

The City of Amsterdam is researching the responsible adoption of Large Language Models (LLMs) by evaluating their performance, environmental impact, and alignment with human values. In this talk, we will share how we develop tailored benchmarks and a dedicated assessment platform to raise awareness and guide responsible implementation.

LLM

Probabilistic forecasting is essential, but choosing the right method is tricky. This talk introduces two lesser-known models — Level Set Forecaster and Quantile Regression Forest — that help you kickstart probabilistic forecasting without unnecessary complexity.

Fokko Driesprong , Rob Zinkov – machine learning engineer and data scientist

Also this year, at our 10 year anniversary edition of PyData Amsterdam, we’ll host open source sprints! ️ Our open source sprints this year will be 2 sessions in parallel, with leading open source contributors Fokko Driesprong and Rob Zinkov of the respective packages PyIceberg and PyMC.