Agentic Cyber Defense with External Threat Intelligence

2025-06-08 · PyData London 2025 Watch

talk

by Jyoti Yadav

AI/ML Cyber Security

This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions.

Debugging Leadership: Six Errors when Moving From Code to Management

2025-06-08 · PyData London 2025

talk

by Matt Upson

Transitioning from a hands-on Pythonista to a leadership role is a journey filled with challenges, and like debugging code, it requires identifying, isolating, and fixing problems. In this talk, I’ll share eight key lessons from my journey from Data Scientist to Co-Founder of a small software company, framed as Python errors.

From battling imposter syndrome (ValueError: self-worth not defined), to learning to delegate (DeadlockError: unable to release control), and avoiding burnout (RuntimeError: system overload), this talk offers actionable advice for anyone navigating the leap from technical contributor to technical leader.

Expect a mix of humour, relatable stories, and hard-won lessons as we explore how debugging leadership challenges is just as rewarding (and occasionally frustrating) as debugging code. Whether you’re considering a leadership role or already on the journey, this session will leave you with practical insights to navigate common pitfalls and approach a leadership transition with a clearer understanding of what to expect.

Diving into Transformer Model Internals

2025-06-08 · PyData London 2025 Watch

talk

by Matt Squire

AI/ML GenAI

While everybody and their dog is building applications on generative AI, the inner workings of transformers - the model architecture behind genAI age - is a mystery for most people. In this talk, I'll walk through how transformers are implemented, using real-life Python code from the HuggingFace transformers library.

Humble Data Workshop

2025-06-08 · PyData London 2025

talk

by Hugh Evans (Imply)

Data Science

Learn Python for Data Science in this Beginners’ Day Workshop Would you like to learn to code but don’t know where to start? Taking your first steps in programming can seem like an impossible task so we’ve decided to put on a workshop to show beginners how it can be done and share our passion for the world of data science!

Apply to be a student https://forms.gle/2cvNyRK8c8pNnpnz5

CUDA in Python: A New Era for GPU Acceleration

2025-06-08 · PyData London 2025 Watch

talk

by Andy Terrel

We discuss bringing Python natively to the CUDA ecosystem. From low level bindings to domain specific applications, CUDA is supporting Python standards and ecosystem. New libraries include nvmath-python for managing optimized mathematics libraries, cccl-python for cooperative threading and device parallelism, cuda-core for managing the complete CUDA toolstack from Python with no need for C++, and finally numba-cuda for generating device side kernels with integration of C++ device libraries and LTO IR.

Git Commit, MedTech Transformed: Python’s Medical Robotics Breakthrough

2025-06-08 · PyData London 2025

talk

by Lilinoe Harbottle

Git

Code changing lives? Absolutely. We're diving into Python's power to deploy cutting-edge solutions for lung cancer diagnosis and treatment in medical and surgical robotics. Expect demos showcasing algorithms, data analysis, and real-world impact—bridging MedTech innovation and life-changing solutions. Ready to see Python revolutionize lung health? Join us. Let's code a healthier future together!

Python Engineering Excellence Birds of a Feather

2025-06-07 · PyData London 2025

talk

by Sam Joseph

A round table discussion on how to excel at Python engineering and architecting systems using Python, what kind of sessions and activities would best help support Python programmers be more effective at Python engineering, and how to achieve Python engineering excellence generally.

Conquering PDFs: document understanding beyond plain text

2025-06-07 · PyData London 2025 Watch

talk

by Ines Montani

Data Science NLP

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

PyScript - Python in the Browser

2025-06-07 · PyData London 2025 Watch

talk

by Chris Laffra

Learn how to write a web app in Python using PyScript, PyOdide, MicroPython, and WASM.

Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python

2025-06-07 · PyData London 2025 Watch

talk

by Theo van Kraay (Microsoft)

API Azure Cosmos GenAI LLM RAG

The use of multiple Large Language Models (LLMs) working together perform complex tasks, known as multi-agent systems, has gained significant traction. While orchestration frameworks like LangGraph and Semantic Kernel can streamline orchestration and coordination among agents, developing large-scale, production-grade systems can bring a host of data challenges. Issues such as supporting multi-tenancy, preserving transactional integrity and state, and managing reliable asynchronous function calls while scaling efficiently can be difficult to navigate.

Leveraging insights from practical experiences in the Azure Cosmos DB engineering team, this talk will guide you through key considerations and best practices for storing, managing, and leveraging data in multi-agent applications at any scale. You’ll learn how to understand core multi-agent concepts and architectures, manage statefulness and conversation histories, personalize agents through retrieval-augmented generation (RAG), and effectively integrate APIs and function calls.

Aimed at developers, architects, and data scientists at all skill levels, this session will show you how to take your multi-agent systems from the lab to full-scale production deployments, ready to solve real-world problems. We’ll also walk through code implementations that can be quickly and easily put into practice, all in Python.

Bringing stories to life with AI, data streaming and generative agents

2025-06-07 · PyData London 2025

talk

by Olena Kutsenko (Confluent)

AI/ML Flink Iceberg Kafka LLM RAG Data Streaming

Explore how AI-powered Generative Agents can evolve in real time using live data streams. Inspired by Stanford's 'Generative Agents' paper, this session dives into building dynamic, AI-driven worlds with Apache Kafka, Flink, and Iceberg - plus LLMs, RAG, and Python. Demos and practical examples included!

Sovereign Data for AI with Python

2025-06-07 · PyData London 2025 Watch

talk

by Lex Avstreikh (Hopsworks)

AI/ML Cloud Computing LLM S3

The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale. We will focus on open-source infrastructure including: a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks a container registry that works at scale a S3 storage layer a database server with a vector index

Parallel PyTorch Inference with Python Free-Threading

2025-06-07 · PyData London 2025 Watch

talk

by Michał Szołucha

PyTorch

This talk examines multi-threaded parallel inference on PyTorch models using the new No-GIL, free-threaded version of Python. Using a simple 124M parameter GPT2 model that we train from scratch, we explore the novel new territory unlocked by free-threaded Python: parallel PyTorch model inference, where multiple threads, unimpeded by the Python GIL, attempt to generate text from a transformer-based model in parallel.

Python Meets Quantum: Learn, Code, and Simulate

2025-06-06 · PyData London 2025

talk

by Andrea Melloncelli

This workshop is designed for Python developers eager to explore the exciting world of quantum computing. Through interactive exercises and practical coding examples, participants will learn how to program quantum computers using Python. No advanced background in quantum mechanics is required - just curiosity and a willingness to dive into cutting-edge technology.

Forecasting Weather using Time Series ML

2025-06-06 · PyData London 2025 Watch

talk

by Suyash Joshi

AI/ML LLM

This hands-on workshop covers how to use open source ML models like LSTMs and TimeSeries LLM's, with Python to try to forecast weather patterns, with best practices for data preparation and real time predictions.

Package Your Python Code as a CLI

2025-06-06 · PyData London 2025 Watch

talk

by Thijs Nieuwdorp (VodafoneZiggo) , Dr. Jeroen Janssens (Posit)

Data Science Unix

Learn how to transform your Python code into a command-line tool. Jeroen Janssens, author of Data Science at the Command Line, guides you through the process of turning your scripts into reusable, executable tools, integrating them into your data workflows and harnessing the power of the Unix command line.

GPU Accelerated Python

2025-06-06 · PyData London 2025 Watch

talk

by Dr. Katrina Riehl (NumFOCUS; Snowflake; Georgetown University) , Lawrence Mitchell , Jeremy Tanner , Jacob Tomlinson

Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries.

Topics include: - Introduction to General Purpose GPU Computing - GPU vs CPU - Which processor is best for which tasks - Introduction to CUDA - How to use CUDA with Python - Using Numba to write kernel functions - CuPy - cuDF

No prior experience with GPU's is necessary, but attendees should be familiar with Python.

Introduction to Bayesian Time Series Analysis with PyMC

2025-06-06 · PyData London 2025

talk

by Chris Fonnesbeck (PyMC Labs)

Time series data is ubiquitous, from stock market prices and weather patterns to disease outbreaks and sports outcomes. Accurately modeling these data and generating useful predictions requires specialized techniques due to the unique characteristics of time series data. This tutorial provides a practical introduction to Bayesian time series analysis using PyMC, a powerful probabilistic programming library in Python. Participants will learn how to build, evaluate, and interpret various Bayesian time series models, including ARIMA models, dynamic linear models, and stochastic volatility models. We'll emphasize practical application, covering data preprocessing, model selection, diagnostics, and forecasting, empowering attendees to tackle real-world time series problems with confidence.

Amazon S3: The Backbone of Modern Data Systems

2025-06-03 · Data Engineering Podcast Listen

podcast_episode

by Mai-Lan Tomsen Bukovec (AWS) , Tobias Macey

AI/ML Analytics AWS Dashboard Data Contracts Data Engineering Data Management Data Quality Datafold Hadoop KPI S3

Summary In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to leading the development and operations of S3, Mylan shares insights on how S3 has become a foundational element in modern data systems, enabling scalable and cost-effective data lakes since its launch alongside Hadoop in 2006. She discusses the architectural patterns enabled by S3, the importance of metadata in data management, and how S3's evolution has been driven by customer needs, leading to innovations like strong consistency and S3 tables.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Mai-Lan Tomsen Bukovec about the evolutions of S3 and how it has transformed data architectureInterview IntroductionHow did you get involved in the area of data management?Most everyone listening knows what S3 is, but can you start by giving a quick summary of what roles it plays in the data ecosystem?What are the major generational epochs in S3, with a particular focus on analytical/ML data systems?The first major driver of analytical usage for S3 was the Hadoop ecosystem. What are the other elements of the data ecosystem that helped shape the product direction of S3?Data storage and retrieval have been core primitives in computing since its inception. What are the characteristics of S3 and all of its copycats that led to such a difference in architectural patterns vs. other shared data technologies? (e.g. NFS, Gluster, Ceph, Samba, etc.)How does the unified pool of storage that is exemplified by S3 help to blur the boundaries between application data, analytical data, and ML/AI data?What are some of the default patterns for storage and retrieval across those three buckets that can lead to anti-patterns which add friction when trying to unify those use cases?The age of AI is leading to a massive potential for unlocking unstructured data, for which S3 has been a massive dumping ground over the years. How is that changing the ways that your customers think about the value of the assets that they have been hoarding for so long?What new architectural patterns is that generating?What are the most interesting, innovative, or unexpected ways that you have seen S3 used for analytical/ML/Ai applications?What are the most interesting, unexpected, or challenging lessons that you have learned while working on S3?When is S3 the wrong choice?What do you have planned for the future of S3?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS S3KinesisKafkaSQSEMRDrupalWordpressNetflix Blog on S3 as a Source of TruthHadoopMapReduceNasa JPLFINRA == Financial Industry Regulatory AuthorityS3 Object VersioningS3 Cross RegionS3 TablesIcebergParquetAWS KMSIceberg RESTDuckDBNFS == Network File SystemSambaGlusterFSCephMinIOS3 MetadataPhotoshop Generative FillAdobe FireflyTurbotax AI AssistantAWS Access AnalyzerData ProductsS3 Access PointAWS Nova ModelsLexisNexis ProtegeS3 Intelligent TieringS3 Principal Engineering TenetsThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How Maverick Data Built a One-Stop Business Management App in Sigma | The Data Apps Conference

2025-06-02 · Sigma Data Apps Conference 2025 Watch

video

by Spencer Baucke (Maverick Data)

AI/ML Cyber Security

As a data consulting firm helping clients solve their data challenges, Maverick Data faced its own operational inefficiencies with fragmented systems for time tracking, client management, project staffing, and financial reporting. Instead of continuing to juggle multiple disconnected applications, they decided to practice what they preach and build a unified solution.

In this session, Spencer Baucke (Co-founder) will demonstrate how Maverick Data built a comprehensive business operations app in Sigma to:

Centralize employee, client, and project management with appropriate role-based security controls Streamline time entry and automated invoice generation to eliminate manual processes Integrate financial data to create real-time projections and business insights Automate reporting with scheduled emails to ensure timely updates for team members By consolidating operations into a single data app, Maverick Data reduced software spend, gained unprecedented visibility into their business performance, and dramatically improved decision-making processes between leadership. The solution has inspired new client solutions based on their internal success.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps

➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

talk-data.com

Python

Activity Trend

Top Events

Top Speakers

Agentic Cyber Defense with External Threat Intelligence

Debugging Leadership: Six Errors when Moving From Code to Management

Diving into Transformer Model Internals

Humble Data Workshop

CUDA in Python: A New Era for GPU Acceleration

Git Commit, MedTech Transformed: Python’s Medical Robotics Breakthrough

Python Engineering Excellence Birds of a Feather

Conquering PDFs: document understanding beyond plain text

PyScript - Python in the Browser

Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python

Bringing stories to life with AI, data streaming and generative agents

Sovereign Data for AI with Python

Parallel PyTorch Inference with Python Free-Threading

Python Meets Quantum: Learn, Code, and Simulate

Forecasting Weather using Time Series ML

Package Your Python Code as a CLI

GPU Accelerated Python

Introduction to Bayesian Time Series Analysis with PyMC

Amazon S3: The Backbone of Modern Data Systems

How Maverick Data Built a One-Stop Business Management App in Sigma | The Data Apps Conference