Juan Acevedo

Activities

3

talks

Software Engineer Google Cloud

Frequent Collaborators

Aditya Bindal Contextual AI 2 Kirat Pandya Osmos 2 Reena Lee Google Cloud 2

Filter by Event / Source

Google Cloud Next '25 2 Google Cloud Next '24 1

Talks & appearances

3 activities · Newest first

Search activities →

Inference at scale with Google Cloud’s AI Hypercomputer

2025-04-09 · Google Cloud Next '25

session

with Aditya Bindal (Contextual AI) , Reena Lee (Google Cloud) , Kirat Pandya (Osmos) , Juan Acevedo (Google Cloud)

AI/ML Cloud Computing

Learn how to run high-throughput and low-latency inference on Google Cloud to maximize price-performance on TPUs and GPUs, leveraging JetStream and vLLM.

Cost efficient serving of stable diffusion models using Cloud TPUs

2024-04-10 · Google Cloud Next '24

session

with Juan Acevedo (Google Cloud) , Nathan Beach (Google Cloud)

AI/ML GenAI Kubernetes

Text-to-image generative AI models such as the Stable Diffusion family of models are rapidly growing in popularity. In this session, we explain how to optimize every layer of your serving architecture – including TPU accelerators, orchestration, model server, and ML framework – to gain significant improvements in performance and cost effectiveness. We introduce many new innovations in Google Kubernetes Engine that improve the cost effectiveness of AI inference, and we provide a deep dive into MaxDiffusion, a brand new library for deploying scalable stable diffusion workloads on TPUs.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Inference at scale with Google Cloud’s AI Hypercomputer (Recap)

· Google Cloud Next '25

session

with Aditya Bindal (Contextual AI) , Reena Lee (Google Cloud) , Kirat Pandya (Osmos) , Juan Acevedo (Google Cloud)

AI/ML

Learn how to run high-throughput and low-latency inference on Google Cloud to maximize price-performance on TPUs and GPUs, leveraging JetStream and vLLM.