talk-data.com talk-data.com

Topic

transformer

2

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

2 activities · Newest first

Abstract: There is great interest in scaling the number of tokens that LLMs can efficiently and effectively ingest, a problem that is notoriously difficult. Training LLMs on a smaller context and hoping that they generalize well to much longer contexts has largely proven to be ineffective. In this talk, I will go over our work that aims to understand the failure points in modern LLM architectures. In particular, I will discuss dispersion in the softmax layers, generalization issues related to positional encodings, and smoothing effects that occur in the representations. Understanding these issues has proven to be fruitful, with related ideas now already being part of frontier models such as LLaMa 4. The talk is intended to be broadly accessible, but a basic understanding of the Transformer architectures used in modern LLMs will be helpful.

Abstract: In this talk, Dr. John E. Ortega will cover the task of machine translation (MT): the digital manner of translating documents with a machine. Specifically, John will provide a history of the major paradigms from MT including rule-based, statistical, neural, and transformer systems. Additionally, John will provide several examples of how MT works along with research-focused experimentation that would help a human translator determine what types of systems should be used for different purposes, especially the use of MT systems for translating low-resource languages. John will specifically dive into translations from Quechua (an indigenous language spoken by millions in Peru) to Finnish (a high-resource languages spoken in Finland, northern Europe).