talk-data.com talk-data.com

E

Speaker

Eric Steinberger

1

talks

Filtering by: Google Cloud Next '24 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 1 activities

Search activities →

If left unmanaged, failures and infrastructure inefficiencies can account for as much as 45% of your compute resources and precious engineering time (according to a Stanford University study). In this session, we discuss how to measure and maximize machine learning (ML) productivity for large-scale training jobs, spanning tens of thousands of accelerators. We’ll demonstrate a canonical view of large-scale training infrastructure and patterns our customers are applying that are available to you today.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.