PyData London 2025

Scaling AI workloads with Ray & Airflow

2025-06-08 Watch

talk

Tatiana Al-Chueyr

AI/ML Airflow Astronomer GitHub LLM Python

Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.

On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.

This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.

Transfer Learning: Leveraging Pretrained Models with Limited Data

2025-06-08 Watch

talk

Salman Khan

AI/ML

Transfer learning has revolutionised machine learning by enabling models trained on large datasets to generalise effectively to tasks with limited data. This talk explores strategies for adapting pretrained models to new domains, focusing on audio processing as a case study. Using YAMNet, Whisper, and wav2vec2 for laughter detection, we demonstrate how to extract meaningful representations, fine-tune models efficiently, and handle severe class imbalances. The session covers feature extraction, model fusion techniques, and best practices for optimising performance in data-scarce environments. Attendees will gain practical insights into applying transfer learning across various modalities beyond audio, maximising model effectiveness when labelled data is scarce.

Building a knowledge graph for climate policy

2025-06-08 Watch

talk

Fred O'Loughlin , Harrison Pim

AI/ML

At Climate Policy Radar, we're building an open-source knowledge graph for climate policy. In this talk, we'll share how we combine in-house expertise with scalable data infrastructure to identify key concepts in thousands of global climate policy documents. We'll also touch on ontology design, equitable evaluation, and the climate impacts of AI.

Is coding assistant as good as we thought in coding?

2025-06-08 Watch

talk

Cheuk Ting Ho

AI/ML

Nowadays coding assistants are everywhere, many IDEs are offering them as plugins, and are becoming more and more powerful. But it prompts us questions, is coding assistant as good as we want it to be? What can and can't these AI agents do? Will AI take my job?

Agentic Cyber Defense with External Threat Intelligence

2025-06-08 Watch

talk

Jyoti Yadav

AI/ML Python Cyber Security

This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions.

Diving into Transformer Model Internals

2025-06-08 Watch

talk

Matt Squire

AI/ML GenAI Python

While everybody and their dog is building applications on generative AI, the inner workings of transformers - the model architecture behind genAI age - is a mystery for most people. In this talk, I'll walk through how transformers are implemented, using real-life Python code from the HuggingFace transformers library.

AI for Everyone - Building Inclusive Machine Learning Models

2025-06-08 Watch

talk

Elizabeth Osanyinro

AI/ML

Artificial Intelligence (AI) and Machine Learning (ML) are transforming industries such as healthcare, finance, education, and entertainment. However, these advancements are not benefiting everyone equally. Biases in datasets, algorithms, and design processes often lead to AI systems that unintentionally exclude or misrepresent underrepresented communities, reinforcing societal inequalities.

This talk, "AI for Everyone: Building Inclusive Machine Learning Models," explores the critical importance of developing AI systems that are ethical, fair, and accessible to all. We will examine real-world examples of AI bias, discuss techniques for identifying and mitigating bias in data and models, and explore frameworks for responsible AI development. Attendees will leave with actionable insights to design AI solutions that promote fairness, inclusivity, and social impact.

Automating Porosity Detection in Additive Manufacturing with Deep Learning

2025-06-08 Watch

talk

Onyekachukwu Ojumah

AI/ML

Additive Manufacturing (AM) enables complex, high-performance components, but porosity defects can compromise structural integrity. Traditional porosity analysis in X-ray CT scans is manual, slow, and inconsistent. This talk introduces a deep learning-based approach using CNNs and segmentation models to automate porosity detection, enhancing accuracy and efficiency. Attendees will gain insights into pre-processing 3D CT scans, training AI models, and solving industry challenges.

Feminist AI Lounge

2025-06-07

talk

Ines Montani

AI/ML

Join our chill space, unwind, chat about Feminist AI and contribute to the PyData London DIY collage zine.

Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline

2025-06-07 Watch

talk

Adam Hill

AI/ML API GenAI LLM Cyber Security

LLMs are magical—until they aren’t. Extracting adverse media entities might sound straightforward, but throw in hallucinations, inconsistent outputs, and skyrocketing API costs, and suddenly, that sleek prototype turns into a production nightmare.

Our adverse media pipeline monitors over 1 million articles a day, sifting through vast amounts of news to identify reports of crimes linked to financial bad actors, money laundering, and other risks. Thanks to GenAI and LLMs, we can tackle this problem in new ways—but deploying these models at scale comes with its own set of challenges: ensuring accuracy, controlling costs, and staying compliant in highly regulated industries.

In this talk, we’ll take you inside our journey to production, exploring the real-world challenges we faced through the lens of key personas: Cautious Claire, the compliance officer who doesn’t trust black-box AI; Magic Mike, the sales lead who thinks LLMs can do anything; Just-Fine-Tune Jenny, the PM convinced fine-tuning will solve everything; Reinventing Ryan, the engineer reinventing the wheel; and Paranoid Pete, the security lead fearing data leaks.

Expect practical insights, cautionary tales, and real-world lessons on making LLMs reliable, scalable, and production-ready. If you've ever wondered why your pipeline works perfectly in a Jupyter notebook but falls apart in production, this talk is for you.

Platforms for valuable AI Products: Iteration, iteration, iteration

2025-06-07 Watch

talk

John Carney (PDFTA)

AI/ML Data Science

In data science experimentation is vital, the more we can experiment, the more we can learn. However quick iteration isn't sufficient we also need to be able to easily promote these experiments to production to deliver value. This requires all the stability and reliability of any production system. John will discuss building platforms that treat iteration as a first class consideration, the role of open source libraries, and balancing trade-offs.

Bringing stories to life with AI, data streaming and generative agents

2025-06-07

talk

Olena Kutsenko

AI/ML Flink Iceberg Kafka LLM Python

Explore how AI-powered Generative Agents can evolve in real time using live data streams. Inspired by Stanford's 'Generative Agents' paper, this session dives into building dynamic, AI-driven worlds with Apache Kafka, Flink, and Iceberg - plus LLMs, RAG, and Python. Demos and practical examples included!

AI agents testing: How to evaluate the unpredictable

2025-06-07 Watch

talk

Emeli Dral

AI/ML LLM

AI agents and multi-step workflows are powerful, but testing them can be tricky. This talk explores practical ways to test these complex systems — like running multi-step simulations, checking tool calls, and using LLMs for evaluation. You'll also learn how to prioritize what to test and set up session-level evaluations with open-source tools.

Sovereign Data for AI with Python

2025-06-07 Watch

talk

Lex Avstreikh

AI/ML Cloud Computing LLM Python S3

The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale. We will focus on open-source infrastructure including: a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks a container registry that works at scale a S3 storage layer a database server with a vector index

Opening Notes & Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation

2025-06-07

talk

Leanne Fitzpatrick

AI/ML Data Science

Since the end of 2022, the AI space has reached unprecedented velocity, scale and proliferation. When it seems like everyone (and their dog) is talking about AI, how should those of us who've been working in Machine Learning, Data Science (and AI) as domain experts look to navigate the conversation? In this talk, Leanne will aim to shine a light on the impact the AI arms race is having on our field, the reality of what it means to be a practitioner and some principles to stick by to help traverse what may appear to be a time of panic.

Building your own vertical agent with AG2 AgentOS

2025-06-06 Watch

talk

Chi Wang , Tim Santos

AI/ML

In this tutorial, we will cover basic and advanced agentic design patterns in AG2 and we will go through practical implementations to demonstrate AI agents in action.

Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases

2025-06-06 Watch

talk

Ahmad Albarqawi

AI/ML NLP

Graph theory is a well-known concept for algorithms and can be used to orchestrate the building of multi-model pipelines. By translating tasks and dependencies into a Directed Acyclic Graph, we can orchestrate diverse AI models, including NLP, vision, and recommendation capabilities. This tutorial provides a step-by-step approach to designing graph-based AI model pipelines, focusing on clinical use cases from the field.

Hands-on workshop on developing Reinforcement Learning solutions with financial domain example use cases.

2025-06-06

talk

Ade Idowu

AI/ML

Reinforcement Learning (RL) has emerged as a transformative sub-field in AI/ML, driving breakthroughs in areas ranging from autonomous robotics to personalized recommendation systems. This workshop is designed to serve a broad audience—from beginners eager to grasp foundational RL concepts to practitioners seeking to deepen their technical expertise through applied projects. These projects will range from developing simple classical RL game environments to practical financial domain use cases such as using RL sequential decision making for stock trading and asset portfolio optimization scenarios.

How To Measure And Mitigate Unfair Bias in Machine Learning Models

2025-06-06 Watch

talk

John Sandall

AI/ML

In this 90-minute workshop, machine learning engineers and data scientists will learn practical techniques for identifying and mitigating age bias in AI-driven hiring systems. We’ll explore fairness metrics like statistical parity, counterfactual fairness, and equalized odds, and demonstrate how tools such as Fairlearn, Aequitas, and AI Fairness 360 can be used to monitor and improve model fairness. Through hands-on exercises, participants will walk away with the skills to evaluate and de-bias models in high-risk areas like recruitment.

Forecasting Weather using Time Series ML

2025-06-06 Watch

talk

Suyash Joshi

AI/ML LLM Python

This hands-on workshop covers how to use open source ML models like LSTMs and TimeSeries LLM's, with Python to try to forecast weather patterns, with best practices for data preparation and real time predictions.

talk-data.com

Top Topics

Top Speakers

Scaling AI workloads with Ray & Airflow

Transfer Learning: Leveraging Pretrained Models with Limited Data

Building a knowledge graph for climate policy

Is coding assistant as good as we thought in coding?

Agentic Cyber Defense with External Threat Intelligence

Diving into Transformer Model Internals

AI for Everyone - Building Inclusive Machine Learning Models

Automating Porosity Detection in Additive Manufacturing with Deep Learning

Feminist AI Lounge

Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline

Platforms for valuable AI Products: Iteration, iteration, iteration

Bringing stories to life with AI, data streaming and generative agents

AI agents testing: How to evaluate the unpredictable

Sovereign Data for AI with Python

Opening Notes & Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation

Building your own vertical agent with AG2 AgentOS

Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases

Hands-on workshop on developing Reinforcement Learning solutions with financial domain example use cases.

How To Measure And Mitigate Unfair Bias in Machine Learning Models

Forecasting Weather using Time Series ML