Inference at record speed with Azure ND Virtual Machines

Azure sets new inference records with 865K and 1.1M tokens/sec on ND GB200/GB300 v6 VMs. These results stem from deep stack optimization—from GPU kernels like GEMM and attention to multi-node scaling. Using LLAMA benchmarks, we’ll show how model architecture and hardware codesign drive throughput and efficiency. Customers benefit from faster time-to-value, lower cost per token, and production-ready infrastructure. Attendees can connect with Azure engineers to discuss best practices.

talk-data.com

Inference at record speed with Azure ND Virtual Machines

Description