talk-data.com
Microsoft Ignite
breakout
2025-11-20
Inference at record speed with Azure ND Virtual Machines
Event:
Microsoft Ignite 2025
Speakers
Description
Azure sets new inference records with 865K and 1.1M tokens/sec on ND GB200/GB300 v6 VMs. These results stem from deep stack optimization—from GPU kernels like GEMM and attention to multi-node scaling. Using LLAMA benchmarks, we’ll show how model architecture and hardware codesign drive throughput and efficiency. Customers benefit from faster time-to-value, lower cost per token, and production-ready infrastructure. Attendees can connect with Azure engineers to discuss best practices.