Search – talk-data.com

Title & Speakers	Event
Event PyData London 2025 2025-06-08
Scaling AI workloads with Ray & Airflow 2025-06-08 · 15:15 Tatiana Al-Chueyr Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray. On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly. This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters. AI/ML Airflow Astronomer GitHub LLM Python	Video
Transfer Learning: Leveraging Pretrained Models with Limited Data 2025-06-08 · 15:15 Salman Khan Transfer learning has revolutionised machine learning by enabling models trained on large datasets to generalise effectively to tasks with limited data. This talk explores strategies for adapting pretrained models to new domains, focusing on audio processing as a case study. Using YAMNet, Whisper, and wav2vec2 for laughter detection, we demonstrate how to extract meaningful representations, fine-tune models efficiently, and handle severe class imbalances. The session covers feature extraction, model fusion techniques, and best practices for optimising performance in data-scarce environments. Attendees will gain practical insights into applying transfer learning across various modalities beyond audio, maximising model effectiveness when labelled data is scarce. AI/ML	Video
Polars, DuckDB, PySpark, PyArrow, pandas, cuDF: how Narwhals has brought them all together! 2025-06-08 · 15:15 Marco Gorelli Suppose you want to write a data science tool to do feature engineering. Your experience may go like this: - Expectation: you can focus on state-of-the art techniques for feature engineering. - Reality: you keep having to make you codebase more complex because a new dataframe library has come out and users are demanding support for it. Or rather, it might have gone like that in the pre-Narwhals era. Because now, you can focus on solving the problems which your tool set out to do, and let Narwhals handle the subtle differences between different kinds of dataframe inputs! Data Science DuckDB Pandas Polars PySpark	Video
Is coding assistant as good as we thought in coding? 2025-06-08 · 14:30 Cheuk Ting Ho Nowadays coding assistants are everywhere, many IDEs are offering them as plugins, and are becoming more and more powerful. But it prompts us questions, is coding assistant as good as we want it to be? What can and can't these AI agents do? Will AI take my job? AI/ML	Video
You Came to a Python Conference. Now, Go Do a PR Review! 2025-06-08 · 14:30 Samiul Huque If you or your organization are spending time and resources attending a Python conference, you will want to ensure your team gets something immediately actionable and helpful out of it. As coders, we often think about writing code as the only way to contribute. However, pull request reviews are an often overlooked, but highly actionable way to have an impact. Giving good PR reviews is an art, with two equally important parts: the technical side and the communication side. While the technical side ensures the quality, maintainability, and efficiency of the Python code, the communication around the PR determines whether the feedback can be understood and acted upon. However, we have all seen code reviews that have been ignored or executed poorly due to poor communication. This talk addresses both facets of PR reviews by introducing the archetypes of bad code reviewers: 1) The “Looks Good to Me” Reviewer: This peer reviewer provides little to no actionable feedback. 2) The “Technical Nitpicker”: This peer reviewer focuses on small Python-specific issues, but fails to communicate constructively. 3) The “Nit” Commenter: This peer reviewer prefaces every comment with “nit,” while offering unclear, yet technically valid suggestions Using these archetypes, we will explore Python-specific technical topics (such as pass by reference vs. pass by value), while delving into how to communicate and deliver feedback in a clear and actionable manner. Using real-world examples, attendees will learn how to: a) Identify and address technical issues in Python PRs b) Communicate feedback effectively c) Balance technical rigor with constructive feedback d) Communicate their peer review comments clearly Python	Video
Building a knowledge graph for climate policy 2025-06-08 · 14:30 Fred O'Loughlin , Harrison Pim At Climate Policy Radar, we're building an open-source knowledge graph for climate policy. In this talk, we'll share how we combine in-house expertise with scalable data infrastructure to identify key concepts in thousands of global climate policy documents. We'll also touch on ontology design, equitable evaluation, and the climate impacts of AI. AI/ML	Video
Debugging Leadership: Six Errors when Moving From Code to Management 2025-06-08 · 13:45 Matt Upson Transitioning from a hands-on Pythonista to a leadership role is a journey filled with challenges, and like debugging code, it requires identifying, isolating, and fixing problems. In this talk, I’ll share eight key lessons from my journey from Data Scientist to Co-Founder of a small software company, framed as Python errors. From battling imposter syndrome (ValueError: self-worth not defined), to learning to delegate (DeadlockError: unable to release control), and avoiding burnout (RuntimeError: system overload), this talk offers actionable advice for anyone navigating the leap from technical contributor to technical leader. Expect a mix of humour, relatable stories, and hard-won lessons as we explore how debugging leadership challenges is just as rewarding (and occasionally frustrating) as debugging code. Whether you’re considering a leadership role or already on the journey, this session will leave you with practical insights to navigate common pitfalls and approach a leadership transition with a clearer understanding of what to expect. Python
Diving into Transformer Model Internals 2025-06-08 · 13:45 Matt Squire While everybody and their dog is building applications on generative AI, the inner workings of transformers - the model architecture behind genAI age - is a mystery for most people. In this talk, I'll walk through how transformers are implemented, using real-life Python code from the HuggingFace transformers library. AI/ML GenAI Python	Video
Humble Data Workshop 2025-06-08 · 13:45 Hugh Evans – Developer Advocate @ Imply Learn Python for Data Science in this Beginners’ Day Workshop Would you like to learn to code but don’t know where to start? Taking your first steps in programming can seem like an impossible task so we’ve decided to put on a workshop to show beginners how it can be done and share our passion for the world of data science! Apply to be a student https://forms.gle/2cvNyRK8c8pNnpnz5 Data Science Python
Agentic Cyber Defense with External Threat Intelligence 2025-06-08 · 13:45 Jyoti Yadav This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions. AI/ML Python Cyber Security	Video
Break 2025-06-08 · 13:15
Break 2025-06-08 · 13:15
Break 2025-06-08 · 13:15
Break 2025-06-08 · 13:15
Keynote- Innovation is Dead 2025-06-08 · 12:30 Tony Mears Join us for an exciting Keynote with Tony Mears!
Lunch 2025-06-08 · 11:30
PyData Organizers Lunch 2025-06-08 · 11:30
Lunch 2025-06-08 · 11:30
Lunch 2025-06-08 · 11:30
Leaders at PyData 2025-06-08 · 10:45 Ian Ozsvald A self-organised workshop for data leaders to discuss the opportunity and challenges they face with their peers. This is the 9th iteration at a PyData conference. Questions are raised and answered by attendees, it is facilitated by Ian Ozsvald (PyDataLondon co-founder). You are encouraged to carry on talking to fellow leaders after this session, Ian will give out badges to help with this. The format is based on the Breakout discussions that Ian uses in his private RebelAI leadership group, you're welcome and encouraged to copy and use it in your own organisations. Typical attendance is 60+ leaders. The 2022 session using a different format ("Executives at PyData" as it was known) was written up, you can see it here: https://numfocus.medium.com/executives-at-pydata-global-2022-193cbc2d3f3b	Video

Scaling AI workloads with Ray & Airflow 2025-06-08 · 15:15

Tatiana Al-Chueyr

Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.

On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.

This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.

AI/ML Airflow Astronomer GitHub LLM Python

Transfer Learning: Leveraging Pretrained Models with Limited Data 2025-06-08 · 15:15

Salman Khan

Transfer learning has revolutionised machine learning by enabling models trained on large datasets to generalise effectively to tasks with limited data. This talk explores strategies for adapting pretrained models to new domains, focusing on audio processing as a case study. Using YAMNet, Whisper, and wav2vec2 for laughter detection, we demonstrate how to extract meaningful representations, fine-tune models efficiently, and handle severe class imbalances. The session covers feature extraction, model fusion techniques, and best practices for optimising performance in data-scarce environments. Attendees will gain practical insights into applying transfer learning across various modalities beyond audio, maximising model effectiveness when labelled data is scarce.

AI/ML

Polars, DuckDB, PySpark, PyArrow, pandas, cuDF: how Narwhals has brought them all together! 2025-06-08 · 15:15

Marco Gorelli

Suppose you want to write a data science tool to do feature engineering. Your experience may go like this: - Expectation: you can focus on state-of-the art techniques for feature engineering. - Reality: you keep having to make you codebase more complex because a new dataframe library has come out and users are demanding support for it.

Or rather, it might have gone like that in the pre-Narwhals era. Because now, you can focus on solving the problems which your tool set out to do, and let Narwhals handle the subtle differences between different kinds of dataframe inputs!

Data Science DuckDB Pandas Polars PySpark

Is coding assistant as good as we thought in coding? 2025-06-08 · 14:30

Cheuk Ting Ho

Nowadays coding assistants are everywhere, many IDEs are offering them as plugins, and are becoming more and more powerful. But it prompts us questions, is coding assistant as good as we want it to be? What can and can't these AI agents do? Will AI take my job?

AI/ML

You Came to a Python Conference. Now, Go Do a PR Review! 2025-06-08 · 14:30

Samiul Huque

If you or your organization are spending time and resources attending a Python conference, you will want to ensure your team gets something immediately actionable and helpful out of it. As coders, we often think about writing code as the only way to contribute. However, pull request reviews are an often overlooked, but highly actionable way to have an impact.

Giving good PR reviews is an art, with two equally important parts: the technical side and the communication side. While the technical side ensures the quality, maintainability, and efficiency of the Python code, the communication around the PR determines whether the feedback can be understood and acted upon. However, we have all seen code reviews that have been ignored or executed poorly due to poor communication.

This talk addresses both facets of PR reviews by introducing the archetypes of bad code reviewers: 1) The “Looks Good to Me” Reviewer: This peer reviewer provides little to no actionable feedback. 2) The “Technical Nitpicker”: This peer reviewer focuses on small Python-specific issues, but fails to communicate constructively. 3) The “Nit” Commenter: This peer reviewer prefaces every comment with “nit,” while offering unclear, yet technically valid suggestions

Using these archetypes, we will explore Python-specific technical topics (such as pass by reference vs. pass by value), while delving into how to communicate and deliver feedback in a clear and actionable manner. Using real-world examples, attendees will learn how to: a) Identify and address technical issues in Python PRs b) Communicate feedback effectively c) Balance technical rigor with constructive feedback d) Communicate their peer review comments clearly

Python

Building a knowledge graph for climate policy 2025-06-08 · 14:30

Fred O'Loughlin , Harrison Pim

At Climate Policy Radar, we're building an open-source knowledge graph for climate policy. In this talk, we'll share how we combine in-house expertise with scalable data infrastructure to identify key concepts in thousands of global climate policy documents. We'll also touch on ontology design, equitable evaluation, and the climate impacts of AI.

AI/ML

Debugging Leadership: Six Errors when Moving From Code to Management 2025-06-08 · 13:45

Matt Upson

Transitioning from a hands-on Pythonista to a leadership role is a journey filled with challenges, and like debugging code, it requires identifying, isolating, and fixing problems. In this talk, I’ll share eight key lessons from my journey from Data Scientist to Co-Founder of a small software company, framed as Python errors.

From battling imposter syndrome (ValueError: self-worth not defined), to learning to delegate (DeadlockError: unable to release control), and avoiding burnout (RuntimeError: system overload), this talk offers actionable advice for anyone navigating the leap from technical contributor to technical leader.

Expect a mix of humour, relatable stories, and hard-won lessons as we explore how debugging leadership challenges is just as rewarding (and occasionally frustrating) as debugging code. Whether you’re considering a leadership role or already on the journey, this session will leave you with practical insights to navigate common pitfalls and approach a leadership transition with a clearer understanding of what to expect.

Python

Diving into Transformer Model Internals 2025-06-08 · 13:45

Matt Squire

While everybody and their dog is building applications on generative AI, the inner workings of transformers - the model architecture behind genAI age - is a mystery for most people. In this talk, I'll walk through how transformers are implemented, using real-life Python code from the HuggingFace transformers library.

AI/ML GenAI Python

Humble Data Workshop 2025-06-08 · 13:45

Hugh Evans – Developer Advocate @ Imply

Learn Python for Data Science in this Beginners’ Day Workshop Would you like to learn to code but don’t know where to start? Taking your first steps in programming can seem like an impossible task so we’ve decided to put on a workshop to show beginners how it can be done and share our passion for the world of data science!

Apply to be a student https://forms.gle/2cvNyRK8c8pNnpnz5

Data Science Python

Agentic Cyber Defense with External Threat Intelligence 2025-06-08 · 13:45

Jyoti Yadav

This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions.

AI/ML Python Cyber Security

Break 2025-06-08 · 13:15

Keynote- Innovation is Dead 2025-06-08 · 12:30

Tony Mears

Join us for an exciting Keynote with Tony Mears!

Lunch 2025-06-08 · 11:30

PyData Organizers Lunch 2025-06-08 · 11:30

Lunch 2025-06-08 · 11:30

Leaders at PyData 2025-06-08 · 10:45

Ian Ozsvald

A self-organised workshop for data leaders to discuss the opportunity and challenges they face with their peers. This is the 9th iteration at a PyData conference. Questions are raised and answered by attendees, it is facilitated by Ian Ozsvald (PyDataLondon co-founder). You are encouraged to carry on talking to fellow leaders after this session, Ian will give out badges to help with this.

The format is based on the Breakout discussions that Ian uses in his private RebelAI leadership group, you're welcome and encouraged to copy and use it in your own organisations. Typical attendance is 60+ leaders.

The 2022 session using a different format ("Executives at PyData" as it was known) was written up, you can see it here: https://numfocus.medium.com/executives-at-pydata-global-2022-193cbc2d3f3b

Activities & events