talk-data.com talk-data.com

Topic

kv caches

1

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Tokenizer-free language model inference ×

A team from Aleph Alpha will talk about tokenizer-free language model inference. This talk presents an approach to language model inference that eliminates the need for conventional large-vocabulary tokenizers, using a core vocabulary of 256 byte values and a three-part architecture (byte-level encoder/decoder, a latent transformer, and patch embeddings). The talk will cover the architecture and engineering challenges in building an efficient inference pipeline, coordinating models, CUDA graphs, and KV caches.