Topic

transformer

Activities

1

tagged

Activity Trend

1 peak/qtr

2020-Q1 2026-Q1

Top Events

#23 AI Series: DeepMind - F. Barbero 1 Translating Electronic Documents: A low-resource story 1

Top Speakers

Federico Barbero (DeepMind / University of Oxford) 1 dr john e ortega (New York University and Columbia University) 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Federico Barbero ×

Why do LLMs struggle with Long Context?

2025-10-21 · #23 AI Series: DeepMind - F. Barbero

talk

by Federico Barbero (DeepMind / University of Oxford)

geometric deep learning llms machine learning

Abstract: There is great interest in scaling the number of tokens that LLMs can efficiently and effectively ingest, a problem that is notoriously difficult. Training LLMs on a smaller context and hoping that they generalize well to much longer contexts has largely proven to be ineffective. In this talk, I will go over our work that aims to understand the failure points in modern LLM architectures. In particular, I will discuss dispersion in the softmax layers, generalization issues related to positional encodings, and smoothing effects that occur in the representations. Understanding these issues has proven to be fruitful, with related ideas now already being part of frontier models such as LLaMa 4. The talk is intended to be broadly accessible, but a basic understanding of the Transformer architectures used in modern LLMs will be helpful.