talk-data.com talk-data.com

Google Cloud Next session 2025-04-10 at 23:30

Audio and visual interactions with Gemini 2.0 and Multimodal Live API

Description

Experience a new way to interact with LLM-powered agents! With Gemini 2.0 and Multimodal Live API, users can give audible instructions and show visual content from a camera or screen, while receiving spoken responses from the model. This enables more natural, timely communication and unlocks multimodal agent workflows. This session showcases how existing agent experiences can be adapted for voice and visual cues, and explores new possibilities with this technology.