Discussion on Insights into Training Video Models.
talk-data.com
Company
Google DeepMind
Speakers
21
Activities
34
Speakers from Google DeepMind
Talks & appearances
34 activities from Google DeepMind speakers
Training models on large-scale data has given us powerful generative capabilities for text, images, and video. However, this success has not yet extended to training generalist embodied agents. This talk tackles this gap by focusing on a potential solution to this problem: scalable world models. We'll trace the idea of planning in predictive models, from its origins to modern efforts on building world models directly from pixels. I'll discuss the primary challenge of scaling these models and present our work, Genie, which enables us to learn world models without explicit action labels at scale, demonstrating a new path forward for training the generalist agents of the future.
In the last couple of years we've seen rapid evolution frontier, massive sized models. Yet at the same time small models have been going through an evolution of their own, using technologies developer for those frontier scaled models. In this talk we'll show how tensor frameworks and autograd made their way into Bayesian models, how massive model development is yielding smaller models, and how both of these are useful for the small data and model developers, and the organizations they support.
Session led by Philipp Schmid, AI Developer Experience at Google DeepMind
Learn the building blocks of autonomous agents, including core architectures, planning methods, memory systems, and leading development frameworks.
Learn the building blocks of autonomous agents, including core architectures, planning methods, memory systems, and leading development frameworks.
Learn the building blocks of autonomous agents, including core architectures, planning methods, memory systems, and leading development frameworks.
Dive into advanced reasoning, multi-agent coordination, tool chaining, self-healing workflows, and emerging security challenges.
Dive into advanced reasoning, multi-agent coordination, tool chaining, self-healing workflows, and emerging security challenges.
Dive into advanced reasoning, multi-agent coordination, tool chaining, self-healing workflows, and emerging security challenges.
Focus on real-world applications with sessions on agent evaluation, reliability, deployment strategies, and cumulative demo showcases.
Focus on real-world applications with sessions on agent evaluation, reliability, deployment strategies, and cumulative demo showcases.
Focus on real-world applications with sessions on agent evaluation, reliability, deployment strategies, and cumulative demo showcases.
Abstract: Our innate ability to reconstruct the 3D world around us from our eyes alone is a fundamental part of human perception. For computers, however, this task remained a significant challenge — until the advent of Neural Radiance Fields (NeRFs). Upon their introduction, NeRFs marked a paradigm shift in the field of novel view synthesis, demonstrating huge improvements in visual realism and geometric accuracy over prior works. The subsequent proliferation of NeRF variants has only expanded their capabilities, unlocking larger scenes, achieving even higher visual fidelity, and accelerating both training and inference. Nevertheless, NeRF is no longer the tool of choice for 3D reconstruction. Why? Join a researcher from the front lines as we explore NeRF’s foundations, dissect its strengths and weaknesses, see how the field has evolved, and explore the future of novel view synthesis.
Live demonstration of connecting agents built by Google DeepMind, DataStax, and Arize AI in real time.
A look at how Google Gemini enables multimodal agents and supports workflows across the Google ecosystem—laying the groundwork for interoperability through A2A.
Gemini 2.0 and 2.5 workshop exploring multimodal app development with Google's Gen AI Python SDK, including real-time interactions, text-to-image generation, and prototyping-to-deployment pipelines.
Gemini 2.0 was built for the agentic era – from native tool use to function calling to robust support for multimodal understanding, the new frontier of applications are agentic. Join this session to explore the frontier of agents, where the best opportunities are for developers to build, open research areas to scale to billions of agents, and how to best leverage Gemini.
Language models have already evolved to do much more than language tasks, principally in the domain of image, audio, and soon video. Join Mostafa Dehghani to explore the emergent frontier of multimodal generation, what Gemini’s world knowledge unlocks that domain specific models cannot create, and how developers should be thinking about AI as a next-generation creative partner.
Join Woosuk Kwon, Founder of vLLM, Robert Shaw, Director of Engineering at Red Hat, and Brittany Rockwell, Product Manager for vLLM on TPU, to learn about how vLLM is helping Google Cloud customers serve state-of-the-art models with high performance and ease of use across TPUs and GPUs.
World models represent a paradigm shift in artificial intelligence, moving beyond passive data consumption to active, predictive understanding of environments. These models enable AI agents to simulate potential futures, plan strategically, and learn more efficiently in complex, dynamic scenarios. In this session, Tim Brooks, Research Scientist at Google DeepMind, will explore the current state of world model research and illuminate the exciting frontiers that lie ahead.
Gemini 2.0, the latest foundational model released by Google DeepMind, offers improved performance, real-time interactions support, text-to-image and text-to-audio generations, Google Search grounding, and reasoning – all under a unified SDK that allows you to flawlessly navigate from the Gemini API to Vertex AI. In this talk, you’ll learn about the newest Gemini 2.0 capabilities, how to accelerate your prototyping, and guidelines to deploy your solutions from a single API to more complex pipelines.
Software engineering has become increasingly complex, with an ever-expanding set of patterns, frameworks, and runtimes. But help is here! AI is revolutionizing the developer workflow, and Google Cloud is reimagining the journey from idea to production. This keynote features demos that showcase how AI can streamline software engineering, empowering you to build important apps, services, and agents faster than ever.