talk-data.com
Event
PyTorch Data Loader Tuning + GPU Cross-Architecture Optimizations: CUDA and AMD
Activities tracked
0
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth
Talk #1: Solving Bottlenecks with Data Input Pipeline with PyTorch Profiler and TensorBoard by Chaim Rand, et al.
Based on this Medium post: https://medium.com/data-science/solving-bottlenecks-on-the-data-input-pipeline-with-pytorch-profiler-and-tensorboard-5dced134dbe9
Talk #2: How to Write Cross-Architecture Kernels: NVIDIA CUDA and AMD ROCm (a.k.a "CUDA for AMD") by Quentin Anthony, Cross-Platform Kernel Engineer @ Zyphra
New models such as DeepSeek-R1 and Llama-4 are being deployed across AMD and NVIDIA GPUs, but how are cross-hardware kernels written? In my talk, we'll discuss considerations such as kernel sizing and cross-architecture optimization when writing kernels across different SIMD hardware.
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth
Talk #1: GPU, PyTorch, and CUDA Performance Optimizations
Talk #2: GPU, PyTorch, and CUDA Performance Optimizations
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
Sessions & talks
Showing 1–0 of 0 · Newest first
No individual activities are attached to this event yet.