talk-data.com
Google Cloud Next
session
2025-04-11 at 19:30
The need for speed: How our customers are slashing AI model startup latency
Event:
Google Cloud Next '25
Speakers
Topics
Description
Grappling with scaling your AI and machine learning (ML) platforms to meet demand and ensuring rapid recovery from failures? This session dives into strategies for optimizing end-to-end startup latency for AI and ML workloads on Google Kubernetes Engine (GKE). We’ll explore how image and pod preloading techniques can significantly reduce startup times, enabling faster scaling and improved reliability. Real-world examples will show how this has led to dramatic improvements in application performance, including a 95% reduction in pod startup time and 1.2x–2x speedup.