In this session learn about performance optimizations for PyTorch on Google Cloud accelerators using OpenXLA. These models are powerful but can be disrupted by resource failures. This talk also explores strategies for achieving greater resiliency when running PyTorch on GPUs, focusing on fault tolerance, checkpointing, and distributed training. Learn how to leverage open source tools to minimize downtime and ensure your deep learning workloads run smoothly.
talk-data.com
Topic
PyTorch
deep_learning
machine_learning
neural_networks
2
tagged
Activity Trend
16
peak/qtr
2020-Q1
2026-Q1
Top Events
O'Reilly AI & ML Books
7
Databricks DATA + AI Summit 2023
6
O'Reilly Data Science Books
5
PyConDE & PyData Berlin 2023
4
Data Engineering Podcast
4
PyData Paris 2025
4
O'Reilly Data Engineering Books
3
Google Cloud Next '24
3
Computer Vision - Classification d'images avec PyTorch
3
Google Cloud Next '25
3
Introduction à PyTorch
3
PyData Berlin 2025
2
Filtering by:
Deepak Patil
×
In this session learn about performance optimizations for PyTorch on Google Cloud accelerators using OpenXLA. These models are powerful but can be disrupted by resource failures. This talk also explores strategies for achieving greater resiliency when running PyTorch on GPUs, focusing on fault tolerance, checkpointing, and distributed training. Learn how to leverage open source tools to minimize downtime and ensure your deep learning workloads run smoothly.