talk-data.com talk-data.com

Google Cloud Next session 2025-04-11 at 16:45

Scaling AI/ML Workloads with Ray on TPUs

Description

Tensor Processing Units (TPUs) are a hardware accelerator designed by Google specifically for large-scale AI/ML computations. Google's new Trillium TPUs are our most performant and energy-efficient TPUs to date, and offer unprecedented levels of scalability. Ray is a unified framework for orchestrating AI/ML workloads on large compute clusters. Ray offers Python-native APIs for training, inference, tuning, reinforcement learning, and more. In this lightning talk, we will demonstrate how you can use Ray to manage workloads on TPUs with an easy-to-use API. We will cover: 1) Training your models with MaxText, 2) Tuning models with Huggingface, and 3) Serving models with vLLM. Audience can gain an understanding of how to build a complete, end-to-end AI/ML infrastructure with Ray and TPUs.