talk-data.com
Event
PyData London 2025
Activities tracked
104
Top Topics
Sessions & talks
Showing 51–75 of 104 · Newest first
Break
Break
Break
Keynote- From Next Token Prediction to Reasoning and Beyond
Large Language Models (LLMs) have grown into prominence as some of the most popular technological artifacts of the day. This talk will provide a highly accessible and visual overview of LLM concepts relevant to today's data professionals. This includes looking at present-day Transformer architectures, tokenizers, reward models, reasoning LLMs, agentic trajectories, and the various training stages of a large language model including next-word prediction, instruction-tuning, preference-tuning, and reinforcement learning.
Diversity Scholar Luncheon
Lunch Break
Lunch Break
Lunch Break
Explore how AI-powered Generative Agents can evolve in real time using live data streams. Inspired by Stanford's 'Generative Agents' paper, this session dives into building dynamic, AI-driven worlds with Apache Kafka, Flink, and Iceberg - plus LLMs, RAG, and Python. Demos and practical examples included!
Cutting Edge Football Analytics using Polars, Keras and Spektral
Football analytics has rapidly evolved over the past five years, becoming a crucial part of professional and fan discourse. While much of the cutting-edge research remains hidden behind the fences of club training grounds, a growing ecosystem of open-source tools now enables anyone to develop advanced football analytics models.
In this talk, I'll showcase key open-source libraries—Polars for high-performance data processing, Keras for deep learning, and Spektral for Graph Neural Networks (GNNs)—to analyze millions of player coordinates from publicly available high-frequency positional tracking data. I'll demonstrate how these tools can be used to build in-game prediction models and extract advanced football metrics that only the most advanced football clubs currently use.
Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance
This talk explores how leveraging Large Language Models (LLMs) to generate structured customer profile summaries improved both compliance analyst workflows and fraud scoring models at a financial institution. Attendees will learn how embeddings derived from LLM-generated narratives outperformed traditional manual feature engineering and raw text embeddings, offering insights into practical applications of NLP in fraud detection.
AI agents and multi-step workflows are powerful, but testing them can be tricky. This talk explores practical ways to test these complex systems — like running multi-step simulations, checking tool calls, and using LLMs for evaluation. You'll also learn how to prioritize what to test and set up session-level evaluations with open-source tools.
How we unified feature engineering across data and backend at Monzo
Deep dive into how Monzo reduced the effort it takes to generate point-in-time correct features for model development and productionise them with realtime streaming using our event-driven architecture.
Sovereign Data for AI with Python
The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale. We will focus on open-source infrastructure including: a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks a container registry that works at scale a S3 storage layer a database server with a vector index
Multi-Task Learning for Fraud detection: From Trees to MLPs
This talk will present Monzo's exploration of multi-task deep learning to enhance our real-time fraud detection systems. I will outline the challenges of card fraud detection, and explain the limitations of traditional gradient boosted decision tree models in terms of generalisation to rare fraud subtypes. This will motivate the use of multi-task learning, which leverages shared dense representations across fraud sub-tasks. By consolidating multiple specialist learners into a single model, we observe improved performance on less prevalent fraud types, leading to better generalisability, scalability, and robustness. I will also share results from testing multi-task models within our fraud detection infrastructure.
Parallel PyTorch Inference with Python Free-Threading
This talk examines multi-threaded parallel inference on PyTorch models using the new No-GIL, free-threaded version of Python. Using a simple 124M parameter GPT2 model that we train from scratch, we explore the novel new territory unlocked by free-threaded Python: parallel PyTorch model inference, where multiple threads, unimpeded by the Python GIL, attempt to generate text from a transformer-based model in parallel.
PyMC Code Sprint
Join the PyMC development team for a fun and engaging hackathon!
Why you should stop pretending your sparse data is dense
Lots of data in the real world has missing values, but historically prevalent data science tools have had limited support for such data. This talk will compare traditional numerical approaches, the more modern alternative Arrow, as well as ArcticDB, the client-side Dataframe database developed at Man Group.
Break
Break
Break
Break
Opening Notes & Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation
Since the end of 2022, the AI space has reached unprecedented velocity, scale and proliferation. When it seems like everyone (and their dog) is talking about AI, how should those of us who've been working in Machine Learning, Data Science (and AI) as domain experts look to navigate the conversation? In this talk, Leanne will aim to shine a light on the impact the AI arms race is having on our field, the reality of what it means to be a practitioner and some principles to stick by to help traverse what may appear to be a time of panic.