with
Aditya Bindal
(Contextual AI)
,
Reena Lee
(Google Cloud)
,
Kirat Pandya
(Osmos)
,
Juan Acevedo
(Google Cloud)
Learn how to run high-throughput and low-latency inference on Google Cloud to maximize price-performance on TPUs and GPUs, leveraging JetStream and vLLM.