In this session learn about performance optimizations for PyTorch on Google Cloud accelerators using OpenXLA. These models are powerful but can be disrupted by resource failures. This talk also explores strategies for achieving greater resiliency when running PyTorch on GPUs, focusing on fault tolerance, checkpointing, and distributed training. Learn how to leverage open source tools to minimize downtime and ensure your deep learning workloads run smoothly.
talk-data.com
D
Speaker
Deepak Patil
2
talks
Group Product Manager, Machine Learning Infrastructure
Google Cloud
Frequent Collaborators
Filter by Event / Source
Talks & appearances
2 activities · Newest first
In this session learn about performance optimizations for PyTorch on Google Cloud accelerators using OpenXLA. These models are powerful but can be disrupted by resource failures. This talk also explores strategies for achieving greater resiliency when running PyTorch on GPUs, focusing on fault tolerance, checkpointing, and distributed training. Learn how to leverage open source tools to minimize downtime and ensure your deep learning workloads run smoothly.