talk-data.com
Google Cloud Next
session
2025-04-10 at 17:45
Cluster Director with GKE: Optimal performance at max scale
Event:
Google Cloud Next '25
Speakers
Topics
Description
Managing massive deployments of accelerators for AI and high performance computing (HPC) workloads can be complex. This talk dives into running AI-optimized Google Kubernetes Engine (GKE) clusters that streamline infrastructure provisioning, workload orchestration, and ongoing operations for tens of thousands of accelerators. Learn how topology-aware scheduling, maintenance controls, and advanced networking capabilities enable ultralow latency and maximum performance by default for demanding workloads like AI pretraining, fine-tuning, inference, and HPC.