Kirat Pandya

Activities

4

talks

CEO Osmos

Frequent Collaborators

Aditya Bindal Contextual AI 2 Juan Acevedo Google Cloud 2 Reena Lee Google Cloud 2

Filter by Event / Source

Google Cloud Next '25 3 Google Cloud Next '24 1

Talks & appearances

4 activities · Newest first

Search activities →

AI Hypercomputer: Performance, scale, and the power of Pathways

2025-04-11 · Google Cloud Next '25

session

with Vaibhav Singh (Google Cloud) , Shaurya Gupta (Google Cloud) , Kirat Pandya (Osmos)

AI/ML

Scale your AI training and achieve peak performance with AI Hypercomputer. Gain actionable insights into optimizing your AI workloads for maximum goodput. Learn how to leverage our robust infrastructure for diverse models, including dense, Mixture of Experts, and diffusion. Discover how to customize your workflows with custom kernels and developer tools, facilitating seamless interactive development. You'll learn firsthand how Pathways, developed by Google Deepmind, enables large scale training resiliency, flexibility to express architecture.

Inference at scale with Google Cloud’s AI Hypercomputer

2025-04-09 · Google Cloud Next '25

session

with Aditya Bindal (Contextual AI) , Reena Lee (Google Cloud) , Kirat Pandya (Osmos) , Juan Acevedo (Google Cloud)

AI/ML Cloud Computing

Learn how to run high-throughput and low-latency inference on Google Cloud to maximize price-performance on TPUs and GPUs, leveraging JetStream and vLLM.

Accelerate AI inference workloads with Google Cloud TPUs and GPUs

2024-04-10 · Google Cloud Next '24

session

with Omer Hasan (AppLovin) , Kirat Pandya (Osmos) , Alexander Spiridonov (Google Cloud) , Uğur Arpacı (Codeway)

AI/ML

Deploying AI models at scale demands high-performance inference capabilities. Google Cloud offers a range of cloud tensor processing units (TPUs) and NVIDIA-powered graphics processing unit (GPU) VMs. This session will guide you through the key considerations for choosing TPUs and GPUs for your inference needs. Explore the strengths of each accelerator for various workloads like large language models and generative AI models. Discover how to deploy and optimize your inference pipeline on Google Cloud using TPUs or GPUs. Understand the cost implications and explore cost-optimization strategies.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Inference at scale with Google Cloud’s AI Hypercomputer (Recap)

· Google Cloud Next '25

session

with Aditya Bindal (Contextual AI) , Reena Lee (Google Cloud) , Kirat Pandya (Osmos) , Juan Acevedo (Google Cloud)

AI/ML

Learn how to run high-throughput and low-latency inference on Google Cloud to maximize price-performance on TPUs and GPUs, leveraging JetStream and vLLM.