talk-data.com talk-data.com

Topic

inference

1

tagged

Activity Trend

2 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Dariusz Piotrowski ×

Ever wondered what actually happens when you call an LLM API? This talk breaks down the inference pipeline from tokenization to text generation, explaining what's really going on under the hood. He will walk through the key sampling strategies and their parameters - temperature, top-p, top-k, beam search. We'll also cover performance tricks like quantization, KV caching, and prompt caching that can speed things up significantly. If time allows, we will also touch on some use-case-specific techniques like pass@k and majority voting.