Interactive Session: Serving LLMs on GPU systems at scale with NVIDIA Dynamo

As LLMs grow, efficient inference requires multi-node execution—introducing challenges in orchestration, scheduling, and low-latency GPU-to-GPU data transfers. Hardware like the GB200 NVL72 delivers massive scale-up compute, but truly scalable inference also depends on advanced software. Explore how open-source frameworks like NVIDIA Dynamo, combined with Azure’s AKS managed Kubernetes service, unlock new levels of performance and cost-efficiency.

talk-data.com

Interactive Session: Serving LLMs on GPU systems at scale with NVIDIA Dynamo

Description