Python

See only what you are allowed to see: Fine-Grained Authorization

2025-09-03 · PyData Berlin 2025 Watch

talk

by Maria Knorps

Managing who can see or do what with your data is a fundamental challenge, especially as applications and data grow in complexity. Traditional role-based systems often lack the granularity needed for modern data platforms. Fine-Grained Authorization (FGA) addresses this by controlling access at the individual resource level. In this 90-minute hands-on tutorial, we will explore implementing FGA using OpenFGA, an open-source authorization engine inspired by Google's Zanzibar. Attendees will learn the core concepts of Relationship-Based Access Control (ReBAC) and get practical experience defining authorization models, writing relationship tuples, and performing authorization checks using the OpenFGA Python SDK. Bring your laptop ready to code to learn how to build secure and flexible permission systems for your data applications.

Docling: Get your documents ready for gen AI

2025-09-03 · PyData Berlin 2025 Watch

talk

by Michele Dolfi , Christoph Auer

AI/ML GenAI GitHub Linux

Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including usages with various generative AI frameworks and protocols (e.g. MCP).

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

2025-09-02 · PyData Berlin 2025 Watch

talk

by Orell Garten

Cloud Computing DuckDB

Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB.

In this talk, you'll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We'll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects.

If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.

Template-based web app and deployment pipeline at an enterprise-ready level on Azure

2025-09-02 · PyData Berlin 2025

talk

by Johannes Schöck

Azure Azure DevOps Bicep DevOps

A practical deep-dive into Azure DevOps pipelines, the Azure CLI, and how to combine pipeline, bicep, and python templates to build a fully automated web app deployment system. Deploying a new proof of concept app within an actual enterprise environment never was faster.

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

2025-09-02 · PyData Berlin 2025 Watch

talk

by Laura Dumont

AI/ML LLM NLP SQL Vector DB

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

175: 5 Unique Data Analyst Projects (beginner to intermediate)

2025-09-02 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Analytics BI Dashboard Data Analytics Power BI SQL Tableau

Here are 5 exciting and unique data analyst projects that will build your skills and impress hiring managers! These range from beginner to advanced and are designed to enhance your data storytelling abilities. ✨ Try Julius today at https://landadatajob.com/Julius-YT Where I Go To Find Datasets (as a data analyst) 👉 https://youtu.be/DHfuvMyBofE?si=ABsdUfzgG7Nsbl89 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator

⌚ TIMESTAMPS 00:00 - Introduction 00:24 - Project 1: Stock Price Analysis 03:46 - Project 2: Real Estate Data Analysis (SQL) 07:52 - Project 3: Personal Finance Dashboard (Tableau or Power BI) 11:20 - Project 4: Pokemon Analysis (Python) 14:16 - Project 5: Football Data Analysis (any tool)

🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Building Reactive Data Apps with Shinylive and WebAssembly

2025-09-02 · PyData Berlin 2025 Watch

talk

by Christoph Scheuch

Cloud Computing Parquet

WebAssembly is reshaping how Python applications can be delivered - allowing fully interactive apps that run directly in the browser, without a traditional backend server. In this talk, I’ll demonstrate how to build reactive, data-driven web apps using Shinylive for Python, combining efficient local storage with Parquet and extending functionality with optional FastAPI cloud services. We’ll explore the benefits and limitations of this architecture, share practical design patterns, and discuss when browser-based Python is the right choice. Attendees will leave with hands-on techniques for creating modern, lightweight, and highly responsive Python data applications.

Building an A/B Testing Framework with NiceGUI

2025-09-01 · PyData Berlin 2025 Watch

talk

by Wessel van de Goor

JavaScript

NiceGUI is a Python-based web UI framework that enables developers to build interactive web applications without using JavaScript. In this talk, I’ll share how my team used NiceGUI to create an internal A/B testing platform entirely in Python. I’ll discuss the key requirements for the platform, why we chose NiceGUI, and how it helped us design the UI, display results, and integrate with the backend. This session will demonstrate how NiceGUI simplifies development, reduces frontend complexity, and speeds up internal tool creation for Python developers.

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

2025-09-01 · PyData Berlin 2025 Watch

talk

by Christian Geier

AI/ML Kubernetes PyTorch

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

2025-09-01 · PyData Berlin 2025 Watch

talk

by Gergely Daroczi

Cloud Computing Data Collection LLM

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

What’s Really Going On in Your Model? A Python Guide to Explainable AI

2025-09-01 · PyData Berlin 2025 Watch

talk

by Yashasvi Misra (Pure Storage)

AI/ML

As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeholders, regulators, or just debugging unexpected results, the ability to explain your model is no longer optional , it's essential.

In this talk, we'll walk through practical tools in the Python ecosystem that help bring transparency to your models, including SHAP, LIME, and Captum. Through hands-on examples, you'll learn how to apply these libraries to real-world models from decision trees to deep neural networks and make sense of what's happening under the hood.

If you've ever struggled to explain your model’s output or justify its decisions, this session will give you a toolkit to build more trustworthy, interpretable systems without sacrificing performance.

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

2025-09-01 · PyData Berlin 2025 Watch

talk

by Mehdi Ouazza (MotherDuck)

Analytics DuckDB ETL/ELT Pandas Polars SQL

Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll learn how to use DuckDB as an in-process analytical database: building data pipelines, caching datasets, and running complex queries with SQL—all without leaving Python. We’ll cover common use cases like ETL, lightweight data orchestration, and interactive analytics workflows. You’ll leave with a solid mental model for using DuckDB effectively as the “SQLite for analytics.”

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

2025-09-01 · PyData Berlin 2025 Watch

talk

by Konrad Richter

API

At GetYourGuide, we transformed experimentation from a centralized, closed system into a democratized, self-service platform accessible to all analysts, engineers, and product teams. In this talk, we'll share our journey to empower individuals across the company to define metrics, create dimensions, and easily extend statistical methods. We'll discuss how we built a Python-based Analyzer toolkit enabling standardized, reusable calculations, and how our experimentation platform provides ad-hoc analytical capabilities through a flexible API. Attendees will gain practical insights into creating scalable, maintainable, and user-friendly experimentation infrastructure, along with access to our open-source sequential testing implementation.

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python

2025-09-01 · PyData Berlin 2025 Watch

talk

by Yaseen Esmaeelpour

Funnel

Optimizing user funnels is a common task for data analysts and data scientists. Funnels are not always linear in the real world. often, the next step depends on earlier responses or actions. This results in complex funnels that can be tricky to analyze. I’ll introduce an open-source Python library I developed that analyzes and visualizes non-linear, conditional funnels by utilizing Graphviz and Streamlit. It calculates conversion rates, drop-offs, time spent on each step, and highlights bottlenecks by color. Attendees will learn about how to quickly explore complex user journeys and generate insightful funnel data.

Emoji Master Challenge: Python Intro for Ages 12-18

2025-08-30 · Python Emoji Master Challenge! [Ages 12-16] [EN/DE]

workshop

emoji

Hands-on Python workshop featuring emoji-based challenges. Learners progress through levels starting with a Python introduction and ending with a reveal of the superheroes created by the students.

Episode 249: AI, Podcasts, Scandinavia Trip and More!

2025-08-29 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA) , Ben Deane

AI/ML GitHub

In this episode, Conor and Bryce chat about some open source projects, podcast recommendations, our upcoming trip to Europe and much more! Link to Episode 249 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein Lelbach: TwitterShow Notes Date Recorded: 2025-08-21 Date Released: 2025-08-29 Astro Bot VideoADSP Episode 176: 🇺🇸 prior, deltas & Dinner with PhineasThrust Github Search Vibing ProjectPaddlePaddle/Paddle RepoUber AresDB RepoLatent Space PodcastBig Technology PodcastCheeky Pint PodcastDwarkesh PodcastTraining Data PodcastADSP Episode 39: How Steve Jobs Saved Sean ParentRoku Engineering SymposiumCopenhagen C++ MeetupCasey Muratori – The Big OOPs: Anatomy of a Thirty-five-year Mistake – BSC 2025NDC Tech Town CUDA Python WorkshopNDC Tech Town CUDA C++ WorkshopIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Python for All: Democratizing Coding Mastery with AI Chatbot Support

2025-08-28 · Aug Event | Python for All: Democratizing Coding Mastery with AI Chatbot Support

workshop

by Kristen Scotti (Carnegie Mellon University)

chatgpt copilot gemini

Hands-on workshop guided by Kristen Scotti that explores using AI chatbots to learn coding in Python, debugging, optimizing, and understanding code, with emphasis on responsible and effective use of AI as an on-demand tutoring aid.

AI Agents in Practice

2025-08-28 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Valentina Alto (Microsoft)

AI/ML LLM ai-ml artificial-intelligence-ai data generative-ai

Discover how to build autonomous AI agents tailored for real-world tasks with 'AI Agents in Practice.' This book guides you through creating and deploying AI systems that go beyond chatbots to solve complex problems, using leading frameworks and practical design patterns. What this Book will help me do Understand and implement core components of AI agents, such as memory, tool integration, and context management. Develop production-ready AI agents for diverse applications using frameworks like LangChain. Design and implement multi-agent systems to enable advanced collaboration and problem-solving. Apply ethical and responsible AI techniques, including monitoring and human oversight, in agent development. Optimize performance and scalability of AI agents for production use cases. Author(s) Valentina Alto is an accomplished AI engineer with years of experience in AI systems design and implementation. Valentina specializes in developing practical solutions utilizing large language models and contemporary frameworks for real-world applications. Through her writing, she conveys complex ideas in an accessible manner, and her goal is to empower AI developers and enthusiasts with the skills to create meaningful solutions. Who is it for? This book is perfect for AI engineers, data scientists, and software developers ready to go beyond foundational knowledge of large language models to implement advanced AI agents. It caters to professionals looking to build scalable solutions and those interested in ethical considerations of AI usage. Readers with a background in machine learning and Python will benefit most from the technical insights provided.

The Model Kitchen: Live Python Performance Lab

2025-08-26 · PyData Helsinki August meetup at Wonna

talk

performance

Bayesian Beasts: Iterative modelling with PyMC to forecast Korkeasaari visits

2025-08-26 · PyData Helsinki August meetup at Wonna

talk

bayesian modelling pymc

talk-data.com

Activity Trend

Top Events

Top Speakers

See only what you are allowed to see: Fine-Grained Authorization

Docling: Get your documents ready for gen AI

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

Template-based web app and deployment pipeline at an enterprise-ready level on Azure

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

175: 5 Unique Data Analyst Projects (beginner to intermediate)

Building Reactive Data Apps with Shinylive and WebAssembly

Building an A/B Testing Framework with NiceGUI

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

What’s Really Going On in Your Model? A Python Guide to Explainable AI

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python

Emoji Master Challenge: Python Intro for Ages 12-18

Episode 249: AI, Podcasts, Scandinavia Trip and More!

Python for All: Democratizing Coding Mastery with AI Chatbot Support

AI Agents in Practice

The Model Kitchen: Live Python Performance Lab

Bayesian Beasts: Iterative modelling with PyMC to forecast Korkeasaari visits