talk-data.com talk-data.com

F

Speaker

Federico Barbero

1

talks

PhD student in Computer Science DeepMind / University of Oxford

Federico Barbero is a PhD student in Computer Science at the University of Oxford (Trinity College), supervised by Michael Bronstein; currently at Google DeepMind in London; works on Geometric Deep Learning with a broader interest in ML security, robustness, and privacy. He is collaborating with Petar Veličković on Algorithmic Reasoning at Google DeepMind in London. In 2023, he interned at Microsoft Research Amsterdam with the AI4Science team on protein sampling. He previously completed an MPhil in Machine Learning and Machine Intelligence at the University of Cambridge (King's College), supervised by Pietro Liò and Cristian Bodnar, where he worked on Topological Deep Learning.

Bio from: #23 AI Series: DeepMind - F. Barbero

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →

Abstract: There is great interest in scaling the number of tokens that LLMs can efficiently and effectively ingest, a problem that is notoriously difficult. Training LLMs on a smaller context and hoping that they generalize well to much longer contexts has largely proven to be ineffective. In this talk, I will go over our work that aims to understand the failure points in modern LLM architectures. In particular, I will discuss dispersion in the softmax layers, generalization issues related to positional encodings, and smoothing effects that occur in the representations. Understanding these issues has proven to be fruitful, with related ideas now already being part of frontier models such as LLaMa 4. The talk is intended to be broadly accessible, but a basic understanding of the Transformer architectures used in modern LLMs will be helpful.