talk-data.com talk-data.com

V

Speaker

Vaibhav Singh

3

talks

Group Product Manager Google Cloud
Filtering by: Google Cloud Next '24 ×

Filter by Event / Source

Talks & appearances

Showing 3 of 4 activities

Search activities →

This session is a deep dive into strategies for maximizing the performance and efficiency of generative AI model training using Vertex AI and Cloud TPUs (Tensor Processing Units) and GPUs. You'll learn how to harness the power of Cloud TPUs and GPUs for accelerated training. Join our experts to learn more about best practices for configuring compute resources, selecting the ideal hardware for your use cases, and streamlining the overall model development process with Ray, Persistent Cluster, and shared reservations.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

If left unmanaged, failures and infrastructure inefficiencies can account for as much as 45% of your compute resources and precious engineering time (according to a Stanford University study). In this session, we discuss how to measure and maximize machine learning (ML) productivity for large-scale training jobs, spanning tens of thousands of accelerators. We’ll demonstrate a canonical view of large-scale training infrastructure and patterns our customers are applying that are available to you today.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

session
with Vaibhav Singh (Google Cloud) , Erik Nijkamp (Salesforce) , Amanpreet Singh (contextual.ai) , Rob Martin (Rehrig Pacific)

Training large AI models at scale requires high-performance and purpose-built infrastructure. This session will guide you through the key considerations for choosing tensor processing units (TPUs) and graphics processing unit (GPUs) for your training needs. Explore the strengths of each accelerator for various workloads, like large language models and generative AI models. Discover best practices for training and optimizing your training workflow on Google Cloud using TPUs and GPUs. Understand the performance and cost implications, along with cost-optimization strategies at scale.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.