Brandon Royal

The need for speed: How our customers are slashing AI model startup latency

2025-04-11 · Google Cloud Next '25

session

with Gari Singh (Google Cloud) , Brandon Royal (Google Cloud)

AI/ML Kubernetes

Grappling with scaling your AI and machine learning (ML) platforms to meet demand and ensuring rapid recovery from failures? This session dives into strategies for optimizing end-to-end startup latency for AI and ML workloads on Google Kubernetes Engine (GKE). We’ll explore how image and pod preloading techniques can significantly reduce startup times, enabling faster scaling and improved reliability. Real-world examples will show how this has led to dramatic improvements in application performance, including a 95% reduction in pod startup time and 1.2x–2x speedup.

AI for startups: NVIDIA NIM™ Microservices + Google Cloud

2025-04-11 · Google Cloud Next '25

session

with Dimitri Maltezakis Vathypetrou (NVIDIA) , Brandon Royal (Google Cloud) , Chuck Freeman (NVIDIA)

AI/ML Kubernetes

Deploy and scale containerized AI models with NVIDIA NIMs on Google Kubernetes Engine (GKE). In this interactive session, you’ll gain hands-on experience deploying pre-built NIMs, managing deployments with kubectl, and autoscaling inference workloads. Ideal for startup developers, technical founders, and tech leads.

**Please bring your laptop to get the most out of this hands-on session**

Scaling multi-tenant AI platforms in the era of agentic AI with GKE

2025-04-11 · Google Cloud Next '25

session

with Abhishek Sawarkar (Nvidia) , Brandon Royal (Google Cloud) , Jeremy Schulman (Major League Baseball)

AI/ML Kubernetes

Is your platform ready for the scale of rapidly evolving models and agents? In this session, we’ll explore strategies for scaling your cloud native AI platform - empowering teams to leverage an increasing variety of AI models and agent frameworks. We’ll dive into tools and practices for maintaining control and cost efficiency while enabling AI engineering teams to quickly iterate on Google Kubernetes Engine (GKE). We’ll explore how NVIDIA NIM microservices deliver optimized inference with minimal tuning.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Go from large language model to market faster with Ray, Hugging Face, and LangChain

2024-04-10 · Google Cloud Next '24

session

with Stephen Allen (GE Appliances) , Alex Zakonov (Google Cloud) , Brandon Royal (Google Cloud)

SQL

In this session, you’ll learn how to deploy a fully-functional Retrieval-Augmented Generation (RAG) application to Google Cloud using open-source tools and models from Ray, HuggingFace, and LangChain. You’ll learn how to augment it with your own data using Ray on Google Kubernetes Engine (GKE) and Cloud SQL’s pgvector extension, deploy any model from HuggingFace to GKE, and rapidly develop your LangChain application on Cloud Run. After the session, you’ll be able to deploy your own RAG application and customize it to your needs.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Accelerating large language models with NVIDIA NIM and NeMo on Google Kubernetes Engine

2024-04-10 · Google Cloud Next '24

session

with Brandon Royal (Google Cloud) , Nik Spirin (NVIDIA)

AI/ML Kubernetes LLM

In this talk, we delve into the complexities of building enterprise AI applications, including customization, evaluation, and inference of large language models (LLMs). We start by outlining the solution design space and presenting a comprehensive LLM evaluation methodology. Then, we review state-of-the-art LLM customization techniques, introduce NVIDIA Inference Microservice (NIM) and a suite of cloud-native NVIDIA NeMo microservices for ease of LLM deployment and operation on Google Kubernetes Engine (GKE). We conclude with a live demo, followed by practical recommendations for enterprises.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Filter by Event / Source