Topic

PyTorch

deep_learning machine_learning neural_networks

Activities

2

tagged

Activity Trend

16 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly AI & ML Books 7 Databricks DATA + AI Summit 2023 6 O'Reilly Data Science Books 5 PyConDE & PyData Berlin 2023 4 Data Engineering Podcast 4 PyData Paris 2025 4 O'Reilly Data Engineering Books 3 Google Cloud Next '24 3 Computer Vision - Classification d'images avec PyTorch 3 Google Cloud Next '25 3 Introduction à PyTorch 3 PyData Berlin 2025 2

Top Speakers

Formateur expert 6 Antonio Rueda-Toicen (Hasso Plattner Institute) 4 Tobias Macey 4 Adi Polak (Treeverse) 2 Michael Shtelma 2 Joohoon Lee (Nvidia) 2 Deepak Patil (Google Cloud) 2 Fabio Nelli 2 Romeo (IBM Research Europe) 2 Chris Chan (Google Cloud) 2 Nisha Mariam Johnson (Google Cloud) 2 Holden Karau (Fight Health Insurance) 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: PyData Berlin 2025 ×

Data science in containers: the good, the bad, and the ugly

2025-09-02 · PyData Berlin 2025 Watch

talk

by Jérôme Petazzoni

Data Science Kubernetes TensorFlow

If we want to run data science workloads (e.g. using Tensorflow, PyTorch, and others) in containers (for local development or production on Kubernetes), we need to build container images. Doing that with a Dockerfile is fairly straightforward, but is it the best method? In this talk, we'll take a well-known speech-to-text model (Whisper) and show various ways to run it in containers, comparing the outcomes in terms of image size and build time.

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

2025-09-01 · PyData Berlin 2025 Watch

talk

by Christian Geier

AI/ML Kubernetes Python

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.