Search – talk-data.com

Title & Speakers	Event
NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session. Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI. Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm	NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations
NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session. Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI. Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm	NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations
NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session. Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI. Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm	NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations
NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session. Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI. Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm	NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations 2025-09-15 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others CUDA Optimizations for high-performance AI.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

Activities & events