talk-data.com
Microsoft Ignite
breakout
2025-11-19 at 18:15
Interactive Session: Serving LLMs on GPU systems at scale with NVIDIA Dynamo
Event:
Microsoft Ignite 2025
Speakers
Topics
Description
As LLMs grow, efficient inference requires multi-node execution—introducing challenges in orchestration, scheduling, and low-latency GPU-to-GPU data transfers. Hardware like the GB200 NVL72 delivers massive scale-up compute, but truly scalable inference also depends on advanced software. Explore how open-source frameworks like NVIDIA Dynamo, combined with Azure’s AKS managed Kubernetes service, unlock new levels of performance and cost-efficiency.