Facing challenges with the cost and performance of your AI inference workloads? This talk presents TPUs and Google Kubernetes Engine (GKE) as a solution for achieving both high throughput and low latency while optimizing costs with open source models and libraries. Learn how to leverage TPUs to scale massive inference workloads efficiently.
talk-data.com
K
Speaker
Kavitha Gowda
1
talks
Senior Product Manager
Google Cloud
Filtering by:
Google Cloud Next '25
×
Filter by Event / Source
Talks & appearances
Showing 1 of 1 activities