Search – talk-data.com

Title & Speakers	Event
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT 2025-06-16 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure. https://modal.com/llm-almanac/advisor Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch. https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning AI: https://bit.ly/gllm	High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT 2025-06-16 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure. https://modal.com/llm-almanac/advisor Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch. https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm	High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT 2025-06-16 · 16:00 Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure. https://modal.com/llm-almanac/advisor Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch. https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning AI: https://bit.ly/gllm	High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT
Speaker Office Hours – Charles Frye 2025-04-22 · 18:45 Charles Frye	AI Council 2025
Lessons From A Year Building With LLMs 2024-07-19 · 17:41 Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025 About Eugene Yan I build ML systems to serve customers at scale, and write to learn and teach. About Shreya Shankar I'm Shreya Shankar. I am a machine learning (ML) engineer and computer scientist in the Bay Area. I am completing my PhD in data management systems for ML, with a human-centered focus. I am fortunate to be advised by Dr. Aditya Parameswaran at UC Berkeley. Go Bears! 🐻 I also consult on ML engineering and production AI strategy for enterprises. Prior to my PhD, I was the first ML engineer at a startup, did research engineering at Google Brain, and engineering at Facebook. Before all of that, I did my BS and MS in computer science at Stanford. Go Trees! 🌲 About Hamel Husain Hamel Husain started working with language models five years ago when he led the team that created CodeSearchNet, a precursor to GitHub CoPilot. Since then, he has seen many successful and unsuccessful approaches to building LLM products. Hamel is also an active open source maintainer and contributor of a wide range of ML/AI projects. Hamel is currently an independent consultant. About Jason Liu Jason is an independent AI consultant, advisor, writer, and educator. His main interests are structured outputs, search and retrieval for RAG as well as understanding how to leverage AI to build scalable and valuable businesses. About Bryan Bischof Bryan Bischof is the Head of AI at Hex, where he leads the team of engineers building Magic—the data science and analytics copilot. Bryan has worked all over the data stack leading teams in analytics, machine learning engineering, data platform engineering, and AI engineering. He started the data team at Blue Bottle Coffee, led several projects at Stitch Fix, and built the data teams at Weights and Biases. Bryan previously co-authored the book Building Production Recommendation Systems with O’Reilly, and teaches Data Science and Analytics in the graduate school at Rutgers. His Ph.D. is in pure mathematics. About Charles Frye AI Engineer at Modal Labs. Building useful technology with large neural networks. 00:00 Introduction 03:22 Strategic: Bryan Bischof & Charles Frye 14:47 Operational: Hamel Husain & Jason Liu 23:51 Tactical: Eugene Yan & Shreya Shankar	AI Engineer World's Fair 2024 YouTube
Streamlit NYC Meetup 2024-03-28 · 22:00 At our meetup on March 28th at Snowflake HQ in New York, hear from fellow Streamlit enthusiasts and AI thought leaders on some of the greatest challenges faced by organizations looking to integrate AI into their workflows. 📣 Talks and Speakers "Scalable Serverless Streamlit Services," with Charles Frye, AI Engineer, Modal Albert Zhao, AI Developer Relations, AWS Brandon Duderstadt, Co-Founder and CEO, Nomic "Streamlit in Snowflake for Enterprise Applications" with Abhi Saini, Sr. Product Manager, Streamlit in Snowflake A Streamlit app demo with Aish Shirahatti, BI Engineer, Snowflake 🗓️ Agenda 6:00 - 6:30 \\| Check-in and catering opens 6:30 \\| Programming begins\, featuring remarks by our speakers 7:30 - 7:45 \\| Share your Streamlit demos and raise your hand if you're hiring 7:45 - 9:00 \\| Mingling --- Streamlit Meetups are community events open to anyone interested in turning intuitive Python into polished data apps in only a few lines of code. Connect with other Streamlit users on everything from best practices to trends in AI! Join the Streamlit community: https://discuss.streamlit.io -- About our speakers Charles Frye, AI Engineer, Modal Charles teaches people to build data-dependent applications, from BI to AI. He got his PhD in 2020 from the University of California, Berkeley, where he taught Data Science for Research Psychologists. Charles has since worked at the intersection of education and engineering at Weights & Biases, Full Stack Deep Learning, and as an independent consultant. He currently works as an AI Engineer at Modal Labs. Abhi Saini, Senior Product Manager for Streamlit in Snowflake Abhi is a Senior Product Manager at Snowflake where he focuses on building a managed Streamlit platform for Enterprise use cases	Streamlit NYC Meetup

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT 2025-06-16 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure.

https://modal.com/llm-almanac/advisor

Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch.

https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/

O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

YouTube: https://www.youtube.com/@AIPerformanceEngineering

Generative AI Free Course on DeepLearning AI: https://bit.ly/gllm

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT 2025-06-16 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure.

https://modal.com/llm-almanac/advisor

Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch.

https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/

O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

YouTube: https://www.youtube.com/@AIPerformanceEngineering

Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT 2025-06-16 · 16:00

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth

Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure.

https://modal.com/llm-almanac/advisor

Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch.

https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/

O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

YouTube: https://www.youtube.com/@AIPerformanceEngineering

Generative AI Free Course on DeepLearning AI: https://bit.ly/gllm

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT

Speaker Office Hours – Charles Frye 2025-04-22 · 18:45

Charles Frye

AI Council 2025

Lessons From A Year Building With LLMs 2024-07-19 · 17:41

Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Eugene Yan I build ML systems to serve customers at scale, and write to learn and teach.

About Shreya Shankar I'm Shreya Shankar. I am a machine learning (ML) engineer and computer scientist in the Bay Area. I am completing my PhD in data management systems for ML, with a human-centered focus. I am fortunate to be advised by Dr. Aditya Parameswaran at UC Berkeley. Go Bears! 🐻 I also consult on ML engineering and production AI strategy for enterprises. Prior to my PhD, I was the first ML engineer at a startup, did research engineering at Google Brain, and engineering at Facebook. Before all of that, I did my BS and MS in computer science at Stanford. Go Trees! 🌲

About Hamel Husain Hamel Husain started working with language models five years ago when he led the team that created CodeSearchNet, a precursor to GitHub CoPilot. Since then, he has seen many successful and unsuccessful approaches to building LLM products. Hamel is also an active open source maintainer and contributor of a wide range of ML/AI projects. Hamel is currently an independent consultant.

About Jason Liu Jason is an independent AI consultant, advisor, writer, and educator. His main interests are structured outputs, search and retrieval for RAG as well as understanding how to leverage AI to build scalable and valuable businesses.

About Bryan Bischof Bryan Bischof is the Head of AI at Hex, where he leads the team of engineers building Magic—the data science and analytics copilot. Bryan has worked all over the data stack leading teams in analytics, machine learning engineering, data platform engineering, and AI engineering. He started the data team at Blue Bottle Coffee, led several projects at Stitch Fix, and built the data teams at Weights and Biases. Bryan previously co-authored the book Building Production Recommendation Systems with O’Reilly, and teaches Data Science and Analytics in the graduate school at Rutgers. His Ph.D. is in pure mathematics.

About Charles Frye AI Engineer at Modal Labs. Building useful technology with large neural networks.

00:00 Introduction 03:22 Strategic: Bryan Bischof & Charles Frye 14:47 Operational: Hamel Husain & Jason Liu 23:51 Tactical: Eugene Yan & Shreya Shankar

AI Engineer World's Fair 2024

YouTube

Streamlit NYC Meetup 2024-03-28 · 22:00

At our meetup on March 28th at Snowflake HQ in New York, hear from fellow Streamlit enthusiasts and AI thought leaders on some of the greatest challenges faced by organizations looking to integrate AI into their workflows.

📣 Talks and Speakers

"Scalable Serverless Streamlit Services," with Charles Frye, AI Engineer, Modal
Albert Zhao, AI Developer Relations, AWS
Brandon Duderstadt, Co-Founder and CEO, Nomic
"Streamlit in Snowflake for Enterprise Applications" with Abhi Saini, Sr. Product Manager, Streamlit in Snowflake
A Streamlit app demo with Aish Shirahatti, BI Engineer, Snowflake

🗓️ Agenda

6:00 - 6:30 \| Check-in and catering opens
6:30 \| Programming begins\, featuring remarks by our speakers
7:30 - 7:45 \| Share your Streamlit demos and raise your hand if you're hiring
7:45 - 9:00 \| Mingling

---

Streamlit Meetups are community events open to anyone interested in turning intuitive Python into polished data apps in only a few lines of code. Connect with other Streamlit users on everything from best practices to trends in AI!

Join the Streamlit community: https://discuss.streamlit.io

-- About our speakers

Charles Frye, AI Engineer, Modal Charles teaches people to build data-dependent applications, from BI to AI. He got his PhD in 2020 from the University of California, Berkeley, where he taught Data Science for Research Psychologists. Charles has since worked at the intersection of education and engineering at Weights & Biases, Full Stack Deep Learning, and as an independent consultant. He currently works as an AI Engineer at Modal Labs.

Abhi Saini, Senior Product Manager for Streamlit in Snowflake Abhi is a Senior Product Manager at Snowflake where he focuses on building a managed Streamlit platform for Enterprise use cases

Streamlit NYC Meetup

talk-data.com

People (73 results)

Companies (1 result)

Activities & events