talk-data.com talk-data.com

Event

PyData London 2025

2025-06-06 – 2025-06-08 PyData

Activities tracked

104

Sessions & talks

Showing 51–75 of 104 · Newest first

Search within this event →

Break

2025-06-07
talk

Break

2025-06-07
talk

Break

2025-06-07
talk

Break

2025-06-07
talk
Keynote- From Next Token Prediction to Reasoning and Beyond

Keynote- From Next Token Prediction to Reasoning and Beyond

2025-06-07 Watch
talk
Jay Alammar (Cohere)
LLM

Large Language Models (LLMs) have grown into prominence as some of the most popular technological artifacts of the day. This talk will provide a highly accessible and visual overview of LLM concepts relevant to today's data professionals. This includes looking at present-day Transformer architectures, tokenizers, reward models, reasoning LLMs, agentic trajectories, and the various training stages of a large language model including next-word prediction, instruction-tuning, preference-tuning, and reinforcement learning.

Diversity Scholar Luncheon

2025-06-07
talk

Lunch Break

2025-06-07
talk

Lunch Break

2025-06-07
talk

Lunch Break

2025-06-07
talk

Bringing stories to life with AI, data streaming and generative agents

2025-06-07
talk

Explore how AI-powered Generative Agents can evolve in real time using live data streams. Inspired by Stanford's 'Generative Agents' paper, this session dives into building dynamic, AI-driven worlds with Apache Kafka, Flink, and Iceberg - plus LLMs, RAG, and Python. Demos and practical examples included!

Cutting Edge Football Analytics using Polars, Keras and Spektral

Cutting Edge Football Analytics using Polars, Keras and Spektral

2025-06-07 Watch
talk

Football analytics has rapidly evolved over the past five years, becoming a crucial part of professional and fan discourse. While much of the cutting-edge research remains hidden behind the fences of club training grounds, a growing ecosystem of open-source tools now enables anyone to develop advanced football analytics models.

In this talk, I'll showcase key open-source libraries—Polars for high-performance data processing, Keras for deep learning, and Spektral for Graph Neural Networks (GNNs)—to analyze millions of player coordinates from publicly available high-frequency positional tracking data. I'll demonstrate how these tools can be used to build in-game prediction models and extract advanced football metrics that only the most advanced football clubs currently use.

Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance

2025-06-07
talk

This talk explores how leveraging Large Language Models (LLMs) to generate structured customer profile summaries improved both compliance analyst workflows and fraud scoring models at a financial institution. Attendees will learn how embeddings derived from LLM-generated narratives outperformed traditional manual feature engineering and raw text embeddings, offering insights into practical applications of NLP in fraud detection.

AI agents testing: How to evaluate the unpredictable

AI agents testing: How to evaluate the unpredictable

2025-06-07 Watch
talk

AI agents and multi-step workflows are powerful, but testing them can be tricky. This talk explores practical ways to test these complex systems — like running multi-step simulations, checking tool calls, and using LLMs for evaluation. You'll also learn how to prioritize what to test and set up session-level evaluations with open-source tools.

How we unified feature engineering across data and backend at Monzo

How we unified feature engineering across data and backend at Monzo

2025-06-07 Watch
talk

Deep dive into how Monzo reduced the effort it takes to generate point-in-time correct features for model development and productionise them with realtime streaming using our event-driven architecture.

Sovereign Data for AI with Python

Sovereign Data for AI with Python

2025-06-07 Watch
talk

The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale. We will focus on open-source infrastructure including: a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks a container registry that works at scale a S3 storage layer a database server with a vector index

Multi-Task Learning for Fraud detection: From Trees to MLPs

Multi-Task Learning for Fraud detection: From Trees to MLPs

2025-06-07 Watch
talk

This talk will present Monzo's exploration of multi-task deep learning to enhance our real-time fraud detection systems. I will outline the challenges of card fraud detection, and explain the limitations of traditional gradient boosted decision tree models in terms of generalisation to rare fraud subtypes. This will motivate the use of multi-task learning, which leverages shared dense representations across fraud sub-tasks. By consolidating multiple specialist learners into a single model, we observe improved performance on less prevalent fraud types, leading to better generalisability, scalability, and robustness. I will also share results from testing multi-task models within our fraud detection infrastructure.

Parallel PyTorch Inference with Python Free-Threading

Parallel PyTorch Inference with Python Free-Threading

2025-06-07 Watch
talk

This talk examines multi-threaded parallel inference on PyTorch models using the new No-GIL, free-threaded version of Python. Using a simple 124M parameter GPT2 model that we train from scratch, we explore the novel new territory unlocked by free-threaded Python: parallel PyTorch model inference, where multiple threads, unimpeded by the Python GIL, attempt to generate text from a transformer-based model in parallel.

PyMC Code Sprint

2025-06-07
talk

Join the PyMC development team for a fun and engaging hackathon!

Why you should stop pretending your sparse data is dense

Why you should stop pretending your sparse data is dense

2025-06-07 Watch
talk

Lots of data in the real world has missing values, but historically prevalent data science tools have had limited support for such data. This talk will compare traditional numerical approaches, the more modern alternative Arrow, as well as ArcticDB, the client-side Dataframe database developed at Man Group.

Break

2025-06-07
talk

Break

2025-06-07
talk

Break

2025-06-07
talk

Break

2025-06-07
talk

Opening Notes & Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation

2025-06-07
talk

Since the end of 2022, the AI space has reached unprecedented velocity, scale and proliferation. When it seems like everyone (and their dog) is talking about AI, how should those of us who've been working in Machine Learning, Data Science (and AI) as domain experts look to navigate the conversation? In this talk, Leanne will aim to shine a light on the impact the AI arms race is having on our field, the reality of what it means to be a practitioner and some principles to stick by to help traverse what may appear to be a time of panic.

Registration & Breakfast

2025-06-07
talk