talk-data.com
Google Cloud Next
session
2025-04-10
Rapidly Deploying ML GPU GKE or Slurm Cluster: A Comprehensive Guide Using Cluster Toolkit and Terraform, From Absolute Beginner to Expert
Event:
Google Cloud Next '25
Description
This talk offers demonstrations and live discussions on how to rapidly deploy production-ready GKE or Slurm clusters using Cluster Toolkit and Terraform. Leverage the latest GPUs to accelerate machine learning workloads and optimize resource utilization with GKE's Kueue, autoscaling Slurm, and Dynamic Workload Scheduler (DWS). Explore storage solutions like Google Cloud Storage (GCS), GCSFuse, Filestore Zonal, and Parallelstore. Leave this session with the tools and knowledge you need to deploy a high-performance ML cluster in minutes.