talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

20

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: PyData London 2025 ×
Scaling AI workloads with Ray & Airflow

Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.

On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.

This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.

You Came to a Python Conference. Now, Go Do a PR Review!

If you or your organization are spending time and resources attending a Python conference, you will want to ensure your team gets something immediately actionable and helpful out of it. As coders, we often think about writing code as the only way to contribute. However, pull request reviews are an often overlooked, but highly actionable way to have an impact.

Giving good PR reviews is an art, with two equally important parts: the technical side and the communication side. While the technical side ensures the quality, maintainability, and efficiency of the Python code, the communication around the PR determines whether the feedback can be understood and acted upon. However, we have all seen code reviews that have been ignored or executed poorly due to poor communication.

This talk addresses both facets of PR reviews by introducing the archetypes of bad code reviewers: 1) The “Looks Good to Me” Reviewer: This peer reviewer provides little to no actionable feedback. 2) The “Technical Nitpicker”: This peer reviewer focuses on small Python-specific issues, but fails to communicate constructively. 3) The “Nit” Commenter: This peer reviewer prefaces every comment with “nit,” while offering unclear, yet technically valid suggestions

Using these archetypes, we will explore Python-specific technical topics (such as pass by reference vs. pass by value), while delving into how to communicate and deliver feedback in a clear and actionable manner. Using real-world examples, attendees will learn how to: a) Identify and address technical issues in Python PRs b) Communicate feedback effectively c) Balance technical rigor with constructive feedback d) Communicate their peer review comments clearly

Agentic Cyber Defense with External Threat Intelligence

This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions.

Transitioning from a hands-on Pythonista to a leadership role is a journey filled with challenges, and like debugging code, it requires identifying, isolating, and fixing problems. In this talk, I’ll share eight key lessons from my journey from Data Scientist to Co-Founder of a small software company, framed as Python errors.

From battling imposter syndrome (ValueError: self-worth not defined), to learning to delegate (DeadlockError: unable to release control), and avoiding burnout (RuntimeError: system overload), this talk offers actionable advice for anyone navigating the leap from technical contributor to technical leader.

Expect a mix of humour, relatable stories, and hard-won lessons as we explore how debugging leadership challenges is just as rewarding (and occasionally frustrating) as debugging code. Whether you’re considering a leadership role or already on the journey, this session will leave you with practical insights to navigate common pitfalls and approach a leadership transition with a clearer understanding of what to expect.

Learn Python for Data Science in this Beginners’ Day Workshop Would you like to learn to code but don’t know where to start? Taking your first steps in programming can seem like an impossible task so we’ve decided to put on a workshop to show beginners how it can be done and share our passion for the world of data science!

Apply to be a student https://forms.gle/2cvNyRK8c8pNnpnz5

CUDA in Python: A New Era for GPU Acceleration

We discuss bringing Python natively to the CUDA ecosystem. From low level bindings to domain specific applications, CUDA is supporting Python standards and ecosystem. New libraries include nvmath-python for managing optimized mathematics libraries, cccl-python for cooperative threading and device parallelism, cuda-core for managing the complete CUDA toolstack from Python with no need for C++, and finally numba-cuda for generating device side kernels with integration of C++ device libraries and LTO IR.

Code changing lives? Absolutely. We're diving into Python's power to deploy cutting-edge solutions for lung cancer diagnosis and treatment in medical and surgical robotics. Expect demos showcasing algorithms, data analysis, and real-world impact—bridging MedTech innovation and life-changing solutions. Ready to see Python revolutionize lung health? Join us. Let's code a healthier future together!

Conquering PDFs: document understanding beyond plain text

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python

The use of multiple Large Language Models (LLMs) working together perform complex tasks, known as multi-agent systems, has gained significant traction. While orchestration frameworks like LangGraph and Semantic Kernel can streamline orchestration and coordination among agents, developing large-scale, production-grade systems can bring a host of data challenges. Issues such as supporting multi-tenancy, preserving transactional integrity and state, and managing reliable asynchronous function calls while scaling efficiently can be difficult to navigate.

Leveraging insights from practical experiences in the Azure Cosmos DB engineering team, this talk will guide you through key considerations and best practices for storing, managing, and leveraging data in multi-agent applications at any scale. You’ll learn how to understand core multi-agent concepts and architectures, manage statefulness and conversation histories, personalize agents through retrieval-augmented generation (RAG), and effectively integrate APIs and function calls.

Aimed at developers, architects, and data scientists at all skill levels, this session will show you how to take your multi-agent systems from the lab to full-scale production deployments, ready to solve real-world problems. We’ll also walk through code implementations that can be quickly and easily put into practice, all in Python.

Sovereign Data for AI with Python

The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale. We will focus on open-source infrastructure including: a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks a container registry that works at scale a S3 storage layer a database server with a vector index

Parallel PyTorch Inference with Python Free-Threading

This talk examines multi-threaded parallel inference on PyTorch models using the new No-GIL, free-threaded version of Python. Using a simple 124M parameter GPT2 model that we train from scratch, we explore the novel new territory unlocked by free-threaded Python: parallel PyTorch model inference, where multiple threads, unimpeded by the Python GIL, attempt to generate text from a transformer-based model in parallel.

This workshop is designed for Python developers eager to explore the exciting world of quantum computing. Through interactive exercises and practical coding examples, participants will learn how to program quantum computers using Python. No advanced background in quantum mechanics is required - just curiosity and a willingness to dive into cutting-edge technology.

GPU Accelerated Python

Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries.

Topics include: - Introduction to General Purpose GPU Computing - GPU vs CPU - Which processor is best for which tasks - Introduction to CUDA - How to use CUDA with Python - Using Numba to write kernel functions - CuPy - cuDF

No prior experience with GPU's is necessary, but attendees should be familiar with Python.

Time series data is ubiquitous, from stock market prices and weather patterns to disease outbreaks and sports outcomes. Accurately modeling these data and generating useful predictions requires specialized techniques due to the unique characteristics of time series data. This tutorial provides a practical introduction to Bayesian time series analysis using PyMC, a powerful probabilistic programming library in Python. Participants will learn how to build, evaluate, and interpret various Bayesian time series models, including ARIMA models, dynamic linear models, and stochastic volatility models. We'll emphasize practical application, covering data preprocessing, model selection, diagnostics, and forecasting, empowering attendees to tackle real-world time series problems with confidence.