talk-data.com talk-data.com

Filter by Source

Select conferences and events

Showing 1 result

Activities & events

Title & Speakers Event
Federico Barbero – PhD student in Computer Science @ DeepMind / University of Oxford

Abstract: There is great interest in scaling the number of tokens that LLMs can efficiently and effectively ingest, a problem that is notoriously difficult. Training LLMs on a smaller context and hoping that they generalize well to much longer contexts has largely proven to be ineffective. In this talk, I will go over our work that aims to understand the failure points in modern LLM architectures. In particular, I will discuss dispersion in the softmax layers, generalization issues related to positional encodings, and smoothing effects that occur in the representations. Understanding these issues has proven to be fruitful, with related ideas now already being part of frontier models such as LLaMa 4. The talk is intended to be broadly accessible, but a basic understanding of the Transformer architectures used in modern LLMs will be helpful.

llms transformer geometric deep learning machine learning
#23 AI Series: DeepMind - F. Barbero
Showing 1 result