with
Dan Saadati
(Google Cloud)
,
Allan Mendes
(Google)
,
Mark Ryan
(Google)
,
Benedict Noero
(Google Cloud)
AI-enabled browser agents are in the news now, but it’s not always clear how they solve real-world problems. In this session, we’ll share our experience building a web browser agent by integrating Gemini into an end-to-end service that follows text instructions to take actions in a web application. We’ll take you through our journey of creating the agent, share the research that inspired us, and show how we’ve used the system to tackle practical problems like validating user flows in the UI and semantically checking web links.