talk-data.com talk-data.com

Event

Building AI Agents with Multimodal Models: NVIDIA DLI Workshop for Academia

2025-12-20 โ€“ 2025-12-20 Meetup Visit website โ†—

Activities tracked

5

Ready to build cutting-edge AI that understands the world through more than just text? Join our hands-on workshop and learn how to build neural network agents that can see, read, and reason across multiple data types! Weโ€™ll explore advanced techniques like data fusion, OCR, and NVIDIA's powerful AI Blueprints to tackle real-world challenges in robotics, healthcare, and beyond.

We'll start with a robotics use case, apply those principles to supercharge Large Language Models (LLMs), and finish by orchestrating a team of models to work together seamlessly. You can find the full workshop description here: https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+C-FX-17+V1

Who is this for This certification workshop is completely free for academic staff and students. A valid academic email address is required to access the NVIDIA DLI compute environment. If you are in industry, please contact [email protected] to request a quote for you or your team.

Register Please remember to fill in the form with your current institutional email. https://forms.gle/YEETAidJqUzEkNS56 the access code to the NVIDIA DLI Platform will be shared through your academic email.

What You Will Learn

  • ๐Ÿง  Data Fusion Mastery: Discover the difference between early, late, and intermediate fusion to combine camera, LiDAR, and other data types.
  • ๐Ÿ“„ PDF & Document AI: Learn to extract and process text from PDFs using Optical Character Recognition (OCR).
  • ๐ŸŒ Agent Orchestration: Understand how to make multiple AI models collaborate to solve complex problems.
  • ๐Ÿชœ NVIDIA AI Blueprints: Get hands-on with the Video Search and Summarization (VSS) blueprint to build powerful applications.
  • ๐Ÿ—ฃ๏ธ Vision-Language Models: Turn a standard Language Model into a Vision Language Model (VLM) that can process images and documents.

Agenda Part 1: Early & Late Fusion (1.0 hr)

  • Fuse camera and LiDAR data to predict object positions.
  • Prep various data types for your neural networks.

Part 2: Intermediate Fusion (1.0 hr)

  • Dive into the theory of multimodal model architecture.
  • Train a Contrastive Pretraining model and create a vector database.

Part 3: Cross-modal Projection (2.0 hrs)

  • Transform an LLM into a Vision Language Model (VLM).
  • Process PDFs like a pro with OCR tools.

Part 4: Model Orchestration (2.0 hrs)

  • Analyze video with Cosmos Nemotron.
  • Use the VSS Blueprint to find answers in video content.

Part 5: Final Assessment (1.0 hr)

  • Put your new skills to the test by converting a pre-trained model to accept a new data type.

Sessions & talks

Showing 1โ€“5 of 5 ยท Newest first

Search within this event →

Part 5: Final Assessment

2025-12-20
workshop

Put your new skills to the test by converting a pre-trained model to accept a new data type.