Search – talk-data.com

Title & Speakers	Event
MoE inference economics from first priciples. 2025-08-20 · 19:00 Piotr Mazurek – Senior AI Inference Engineer @ Aleph Alpha The release of Kimi K2 mixture-of-expert (MoE) models has firmly established them as the leading architecture of large language models (LLMs) at the intelligence frontier. Due to their massive size (+1 trillion parameters) and sparse computation pattern, selectively activating parameter subsets rather than the entire model for each token, MoE-style LLMs present significant challenges for inference workloads, significantly altering the underlying inference economics. With the ever-growing consumer demand for AI models, as well as the internal need of AGI companies to generate trillions of tokens of synthetic data, the \"cost per token\" is becoming an even more important factor, determining the profit margins and the cost of capex required for internal reinforcment learning (RL) training rollouts. In this talk we will go through the details of the cost structure of generating a \"DeepSeek token,\" we will discuss the tradeoffs between latency/throughput and cost, and we will try to estimate the optimal setup to run it.\n\nIf you want to join this event, please sign up on our Luma page: https://lu.ma/2ae8czbn\n⚠️ Registration is free, but required due to building security.\n\nSpeakers:\n\n* Piotr Mazurek (https://x.com/tugot17), Senior AI Inference Engineer AI/ML LLM Cyber Security	MoE inference economics from first priciples

The release of Kimi K2 mixture-of-expert (MoE) models has firmly established them as the leading architecture of large language models (LLMs) at the intelligence frontier. Due to their massive size (+1 trillion parameters) and sparse computation pattern, selectively activating parameter subsets rather than the entire model for each token, MoE-style LLMs present significant challenges for inference workloads, significantly altering the underlying inference economics. With the ever-growing consumer demand for AI models, as well as the internal need of AGI companies to generate trillions of tokens of synthetic data, the \"cost per token\" is becoming an even more important factor, determining the profit margins and the cost of capex required for internal reinforcment learning (RL) training rollouts. In this talk we will go through the details of the cost structure of generating a \"DeepSeek token,\" we will discuss the tradeoffs between latency/throughput and cost, and we will try to estimate the optimal setup to run it.\n\nIf you want to join this event, please sign up on our Luma page: https://lu.ma/2ae8czbn\n⚠️ Registration is free, but required due to building security.\n\nSpeakers:\n\n* Piotr Mazurek (https://x.com/tugot17), Senior AI Inference Engineer

talk-data.com

People (17 results)

Activities & events