Topic

tokenization

Activities

1

tagged

Activity Trend

6 peak/qtr

2020-Q1 2026-Q1

Top Events

Stablecoins as a Payment Rail 1 PyData Trójmiasto #37 1 Fabric-X – Programming Model and Application Development Deep Dive 1 Démystifier les LLM avec NanoChat de Andrej Karpathy 1 Fabric-X – Programming Model and Application Development Deep Dive 1 Fabric-X – Programming Model and Application Development Deep Dive 1 Fabric-X – Programming Model and Application Development Deep Dive 1

Top Speakers

David Viejo (ChainLaunch) 4 Pasquale Convertini (IBM) 4 Marcus Brandenburger (IBM Research) 4 Angelo de Caro (IBM Research) 4 Arne Rutjes (IBM) 4 Elli Androulaki (IBM) 4 Alexander Gloy (Fair Observer) 1 Dariusz Piotrowski (Amazon Robotics) 1 Rob Durscki (Stellar Development Foundation) 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Dariusz Piotrowski ×

WTF is Temperature: LLM inference in Practice

2025-10-29 · PyData Trójmiasto #37

talk

by Dariusz Piotrowski (Amazon Robotics)

LLM beam search inference kv caching prompt caching quantization sampling strategies

Ever wondered what actually happens when you call an LLM API? This talk breaks down the inference pipeline from tokenization to text generation, explaining what's really going on under the hood. He will walk through the key sampling strategies and their parameters - temperature, top-p, top-k, beam search. We'll also cover performance tricks like quantization, KV caching, and prompt caching that can speed things up significantly. If time allows, we will also touch on some use-case-specific techniques like pass@k and majority voting.