talk-data.com talk-data.com

B

Speaker

Brandon Royal

2

talks

Product Manager, AI on Google Kubernetes Engine Google Cloud
Filtering by: Google Cloud Next '24 ×

Filter by Event / Source

Talks & appearances

Showing 2 of 5 activities

Search activities →

In this session, you’ll learn how to deploy a fully-functional Retrieval-Augmented Generation (RAG) application to Google Cloud using open-source tools and models from Ray, HuggingFace, and LangChain. You’ll learn how to augment it with your own data using Ray on Google Kubernetes Engine (GKE) and Cloud SQL’s pgvector extension, deploy any model from HuggingFace to GKE, and rapidly develop your LangChain application on Cloud Run. After the session, you’ll be able to deploy your own RAG application and customize it to your needs.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

In this talk, we delve into the complexities of building enterprise AI applications, including customization, evaluation, and inference of large language models (LLMs). We start by outlining the solution design space and presenting a comprehensive LLM evaluation methodology. Then, we review state-of-the-art LLM customization techniques, introduce NVIDIA Inference Microservice (NIM) and a suite of cloud-native NVIDIA NeMo microservices for ease of LLM deployment and operation on Google Kubernetes Engine (GKE). We conclude with a live demo, followed by practical recommendations for enterprises.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.