talk-data.com talk-data.com

Event

PyData London 2025

2025-06-06 – 2025-06-08 PyData

Activities tracked

104

Sessions & talks

Showing 26–50 of 104 · Newest first

Search within this event →
Reproducibility in Embedding Benchmarks

Reproducibility in Embedding Benchmarks

2025-06-08 Watch
talk

Reproducibility in embedding benchmarks is no small feat. Prompt variability, growing computational demands, and evolving tasks make fair comparisons a challenge. The need for robust benchmarking has never been greater. In this talk, we’ll explore the quirks and complexities of benchmarking embedding models, such as prompt sensitivity, scaling issues, and emergent behaviors.

We’ll hear straight from the Massive Text Embedding Benchmark (MTEB) maintainers and show how MTEB (and its extensions like MMTEB and MIEB) simplifies reproducibility, making it easier for researchers and industry practitioners to measure progress, choose the right models, and push the boundaries of embedding performance.

AI for Everyone - Building Inclusive Machine Learning Models

AI for Everyone - Building Inclusive Machine Learning Models

2025-06-08 Watch
talk

Artificial Intelligence (AI) and Machine Learning (ML) are transforming industries such as healthcare, finance, education, and entertainment. However, these advancements are not benefiting everyone equally. Biases in datasets, algorithms, and design processes often lead to AI systems that unintentionally exclude or misrepresent underrepresented communities, reinforcing societal inequalities.

This talk, "AI for Everyone: Building Inclusive Machine Learning Models," explores the critical importance of developing AI systems that are ethical, fair, and accessible to all. We will examine real-world examples of AI bias, discuss techniques for identifying and mitigating bias in data and models, and explore frameworks for responsible AI development. Attendees will leave with actionable insights to design AI solutions that promote fairness, inclusivity, and social impact.

Automating Porosity Detection in Additive Manufacturing with Deep Learning

Automating Porosity Detection in Additive Manufacturing with Deep Learning

2025-06-08 Watch
talk

Additive Manufacturing (AM) enables complex, high-performance components, but porosity defects can compromise structural integrity. Traditional porosity analysis in X-ray CT scans is manual, slow, and inconsistent. This talk introduces a deep learning-based approach using CNNs and segmentation models to automate porosity detection, enhancing accuracy and efficiency. Attendees will gain insights into pre-processing 3D CT scans, training AI models, and solving industry challenges.

From Trees to Transformers: Our Journey Towards Deep Learning for Ranking

From Trees to Transformers: Our Journey Towards Deep Learning for Ranking

2025-06-08 Watch
talk

GetYourGuide, a global marketplace for travel experiences, reached diminishing returns with its XGBoost-based ranking system. We switched to a Deep Learning pipeline in just nine months, maintaining high throughput and low latency. We iterated on over 50 offline models and conducted more than 10 live A/B tests, ultimately deploying a PyTorch transformer that yielded significant gains. In this talk, we will share our phased approach—from a simple baseline to a high-impact launch—and discuss the key operational and modeling challenges we faced. Learn how to transition from tree-based methods to neural networks and unlock new possibilities for real-time ranking.

Break

2025-06-08
talk

Break

2025-06-08
talk

Break

2025-06-08
talk

Break

2025-06-08
talk

Registration & Breakfast

2025-06-08
talk

Registration & Breakfast

2025-06-08
talk

Registration & Breakfast

2025-06-08
talk

Registration & Breakfast

2025-06-08
talk

PyData London 2025 Happy Hour

2025-06-07
talk

Join us for drinks, snacks and networking from 5-6pm.

LLM Inference Arithmetics: the Theory behind Model Serving

2025-06-07
talk

Have you ever asked yourself how parameters for an LLM are counted, or wondered why Gemma 2B is actually closer to a 3B model? You have no clue about what a KV-Cache is? (And, before you ask: no, it's not a Redis fork.) Do you want to find out how much GPU VRAM you need to run your model smoothly?

If your answer to any of these questions was "yes", or you have another doubt about inference with LLMs - such as batching, or time-to-first-token - this talk is for you. Well, except for the Redis part.

NetworkX is Fast Now: Zero Code Change Acceleration

NetworkX is Fast Now: Zero Code Change Acceleration

2025-06-07 Watch
talk

Have you ever wondered how to find connections in your data and to gain insights from them? Come discover how NetworkX makes this easy (and fast!).

This talk is broadly divided into two parts. First we will talk about the power of graph analytics and how you can use tools like NetworkX to extract information from your data, and then we will talk about how we made the machinery behind NetworkX work with heterogeneous backends like GraphBLAS (CPU optimized) and cuGraph (GPU optimized).

Successful Projects through a bit of Rebellion

Successful Projects through a bit of Rebellion

2025-06-07 Watch
talk

This talk is for leaders who want new techniques to improve their success rates. In the last 15 months I've built a private data science peer mentorship group where we discuss rebellious ideas that improve our ability to make meaningful change in organisations of all sizes.

As a leader you've no doubt had trouble defining new projects (perhaps you've been asked - "add ChatGPT!"), getting buy-in, building support, defining defensible metrics and milestones, hiring, developing your team, dealing with conflict, avoiding overload and ultimately delivering valuable projects that are adopted by the business. I'll share advice across all of these areas based on 25 years of personal experience and the topics we've discussed in my leadership community.

You'll walk away with new ideas, perspectives and references that ought to change how to work with your team and organisation.

Feminist AI Lounge

2025-06-07
talk

Join our chill space, unwind, chat about Feminist AI and contribute to the PyData London DIY collage zine.

Media Mix Modelling - how we can save company budget?

Media Mix Modelling - how we can save company budget?

2025-06-07 Watch
talk

How can engineers empower marketing teams in the post-cookie era? Discover Bayesian Media Mix Modelling (MMM), a robust data science approach to evaluate multi-channel marketing effectiveness. Learn how to implement MMM and take actionable insights back to your company.

Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline

Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline

2025-06-07 Watch
talk

LLMs are magical—until they aren’t. Extracting adverse media entities might sound straightforward, but throw in hallucinations, inconsistent outputs, and skyrocketing API costs, and suddenly, that sleek prototype turns into a production nightmare.

Our adverse media pipeline monitors over 1 million articles a day, sifting through vast amounts of news to identify reports of crimes linked to financial bad actors, money laundering, and other risks. Thanks to GenAI and LLMs, we can tackle this problem in new ways—but deploying these models at scale comes with its own set of challenges: ensuring accuracy, controlling costs, and staying compliant in highly regulated industries.

In this talk, we’ll take you inside our journey to production, exploring the real-world challenges we faced through the lens of key personas: Cautious Claire, the compliance officer who doesn’t trust black-box AI; Magic Mike, the sales lead who thinks LLMs can do anything; Just-Fine-Tune Jenny, the PM convinced fine-tuning will solve everything; Reinventing Ryan, the engineer reinventing the wheel; and Paranoid Pete, the security lead fearing data leaks.

Expect practical insights, cautionary tales, and real-world lessons on making LLMs reliable, scalable, and production-ready. If you've ever wondered why your pipeline works perfectly in a Jupyter notebook but falls apart in production, this talk is for you.

Platforms for valuable AI Products: Iteration, iteration, iteration

Platforms for valuable AI Products: Iteration, iteration, iteration

2025-06-07 Watch
talk
John Carney (PDFTA)

In data science experimentation is vital, the more we can experiment, the more we can learn. However quick iteration isn't sufficient we also need to be able to easily promote these experiments to production to deliver value. This requires all the stability and reliability of any production system. John will discuss building platforms that treat iteration as a first class consideration, the role of open source libraries, and balancing trade-offs.

Python Engineering Excellence Birds of a Feather

2025-06-07
talk

A round table discussion on how to excel at Python engineering and architecting systems using Python, what kind of sessions and activities would best help support Python programmers be more effective at Python engineering, and how to achieve Python engineering excellence generally.

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text

2025-06-07 Watch
talk

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

PyScript - Python in the Browser

PyScript - Python in the Browser

2025-06-07 Watch
talk

Learn how to write a web app in Python using PyScript, PyOdide, MicroPython, and WASM.

Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python

Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python

2025-06-07 Watch
talk

The use of multiple Large Language Models (LLMs) working together perform complex tasks, known as multi-agent systems, has gained significant traction. While orchestration frameworks like LangGraph and Semantic Kernel can streamline orchestration and coordination among agents, developing large-scale, production-grade systems can bring a host of data challenges. Issues such as supporting multi-tenancy, preserving transactional integrity and state, and managing reliable asynchronous function calls while scaling efficiently can be difficult to navigate.

Leveraging insights from practical experiences in the Azure Cosmos DB engineering team, this talk will guide you through key considerations and best practices for storing, managing, and leveraging data in multi-agent applications at any scale. You’ll learn how to understand core multi-agent concepts and architectures, manage statefulness and conversation histories, personalize agents through retrieval-augmented generation (RAG), and effectively integrate APIs and function calls.

Aimed at developers, architects, and data scientists at all skill levels, this session will show you how to take your multi-agent systems from the lab to full-scale production deployments, ready to solve real-world problems. We’ll also walk through code implementations that can be quickly and easily put into practice, all in Python.