talk-data.com

Speaker

Vaibhav Singh

Activities

talks

Group Product Manager Google Cloud

Filter by Event / Source

Google Cloud Next '24 3 Google Cloud Next '25 1

Talks & appearances

4 activities · Newest first

Search activities →

AI Hypercomputer: Performance, scale, and the power of Pathways

2025-04-11 · Google Cloud Next '25

session

with Vaibhav Singh (Google Cloud) , Shaurya Gupta (Google Cloud) , Kirat Pandya (Osmos)

AI/ML

Scale your AI training and achieve peak performance with AI Hypercomputer. Gain actionable insights into optimizing your AI workloads for maximum goodput. Learn how to leverage our robust infrastructure for diverse models, including dense, Mixture of Experts, and diffusion. Discover how to customize your workflows with custom kernels and developer tools, facilitating seamless interactive development. You'll learn firsthand how Pathways, developed by Google Deepmind, enables large scale training resiliency, flexibility to express architecture.

Flexible generative AI training and tuning with Vertex AI, cloud accelerators, and beyond

2024-04-11 · Google Cloud Next '24

session

with Christopher Cho (Google Cloud) , Vaibhav Singh (Google Cloud) , Isaac Vidas (Shopify)

AI/ML GenAI

This session is a deep dive into strategies for maximizing the performance and efficiency of generative AI model training using Vertex AI and Cloud TPUs (Tensor Processing Units) and GPUs. You'll learn how to harness the power of Cloud TPUs and GPUs for accelerated training. Join our experts to learn more about best practices for configuring compute resources, selecting the ideal hardware for your use cases, and streamlining the overall model development process with Ray, Persistent Cluster, and shared reservations.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Maximize machine learning productivity at scale

2024-04-11 · Google Cloud Next '24

session

with Vaibhav Singh (Google Cloud) , Opher Lieber (AI21 Labs) , Eric Steinberger (magic.dev)

AI/ML Cloud Computing

If left unmanaged, failures and infrastructure inefficiencies can account for as much as 45% of your compute resources and precious engineering time (according to a Stanford University study). In this session, we discuss how to measure and maximize machine learning (ML) productivity for large-scale training jobs, spanning tens of thousands of accelerators. We’ll demonstrate a canonical view of large-scale training infrastructure and patterns our customers are applying that are available to you today.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Accelerate AI training workloads with Google Cloud TPUs and GPUs

2024-04-10 · Google Cloud Next '24

session

with Vaibhav Singh (Google Cloud) , Erik Nijkamp (Salesforce) , Amanpreet Singh (contextual.ai) , Rob Martin (Rehrig Pacific)

AI/ML

Training large AI models at scale requires high-performance and purpose-built infrastructure. This session will guide you through the key considerations for choosing tensor processing units (TPUs) and graphics processing unit (GPUs) for your training needs. Explore the strengths of each accelerator for various workloads, like large language models and generative AI models. Discover best practices for training and optimizing your training workflow on Google Cloud using TPUs and GPUs. Understand the performance and cost implications, along with cost-optimization strategies at scale.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.