Monitor performance for LLM training and inference workloads on GKE

Struggling to monitor the performance and health of your large language model (LLM) deployments on Google Kubernetes Engine (GKE)? This session unveils how the Google Cloud Observability suite provides a comprehensive solution for monitoring leading AI model servers like Ray, NVIDIA Triton, vLLM, TGI, and others. Learn how our one-click setup automatically configures dashboards, alerts, and critical metrics – including GPU and TPU utilization, latency, throughput, and error analysis – to enable faster troubleshooting and optimized performance. Discover how to gain complete visibility into your LLM infrastructure.

talk-data.com

Monitor performance for LLM training and inference workloads on GKE

Description