talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2026-01-19 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2025-12-15 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2025-12-15 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2025-12-15 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
GPU, CUDA, and PyTorch Performance Optimizations
2025-12-15 · 17:00
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
GPU, CUDA, and PyTorch Performance Optimizations
|
|
AI Systems Performance Engineering
2025-11-12
Chris Fregly
– author
Elevate your AI system performance capabilities with this definitive guide to maximizing efficiency across every layer of your AI infrastructure. In today's era of ever-growing generative models, AI Systems Performance Engineering provides engineers, researchers, and developers with a hands-on set of actionable optimization strategies. Learn to co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems that excel in both training and inference. Authored by Chris Fregly, a performance-focused engineering and product leader, this resource transforms complex AI systems into streamlined, high-impact AI solutions. Inside, you'll discover step-by-step methodologies for fine-tuning GPU CUDA kernels, PyTorch-based algorithms, and multinode training and inference systems. You'll also master the art of scaling GPU clusters for high performance, distributed model training jobs, and inference servers. The book ends with a 175+-item checklist of proven, ready-to-use optimizations. Codesign and optimize hardware, software, and algorithms to achieve maximum throughput and cost savings Implement cutting-edge inference strategies that reduce latency and boost throughput in real-world settings Utilize industry-leading scalability tools and frameworks Profile, diagnose, and eliminate performance bottlenecks across complex AI pipelines Integrate full stack optimization techniques for robust, reliable AI system performance |
O'Reilly Data Engineering Books
|
|
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure. https://modal.com/llm-almanac/advisor Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch. https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning AI: https://bit.ly/gllm |
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT
|
|
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure. https://modal.com/llm-almanac/advisor Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch. https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT
|
|
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure. https://modal.com/llm-almanac/advisor Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch. https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning AI: https://bit.ly/gllm |
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT
|
|
Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: Solving Bottlenecks with Data Input Pipeline with PyTorch Profiler and TensorBoard by Chaim Rand, et al. Based on this Medium post: https://medium.com/data-science/solving-bottlenecks-on-the-data-input-pipeline-with-pytorch-profiler-and-tensorboard-5dced134dbe9 Talk #2: How to Write Cross-Architecture Kernels: NVIDIA CUDA and AMD ROCm (a.k.a "CUDA for AMD") by Quentin Anthony, Cross-Platform Kernel Engineer @ Zyphra New models such as DeepSeek-R1 and Llama-4 are being deployed across AMD and NVIDIA GPUs, but how are cross-hardware kernels written? In my talk, we'll discuss considerations such as kernel sizing and cross-architecture optimization when writing kernels across different SIMD hardware. Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: GPU, PyTorch, and CUDA Performance Optimizations Talk #2: GPU, PyTorch, and CUDA Performance Optimizations Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm |
PyTorch Data Loader Tuning + GPU Cross-Architecture Optimizations: CUDA and AMD
|