talk-data.com talk-data.com

J

Speaker

Jie Wu

1

talks

Software Engineer Google Cloud

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →

Struggling to monitor the performance and health of your large language model (LLM) deployments on Google Kubernetes Engine (GKE)? This session unveils how the Google Cloud Observability suite provides a comprehensive solution for monitoring leading AI model servers like Ray, NVIDIA Triton, vLLM, TGI, and others. Learn how our one-click setup automatically configures dashboards, alerts, and critical metrics – including GPU and TPU utilization, latency, throughput, and error analysis – to enable faster troubleshooting and optimized performance. Discover how to gain complete visibility into your LLM infrastructure.