Learning Theory of Transformers: Generalization and Optimization of In-Context Learning

Channel:

Simons Institute for the Theory of Computing

Subscribers:

69,500

Published on December 24, 2024 5:55:48 PM ● Video Link: https://www.youtube.com/watch?v=WyeomuU2vQw

Duration: 0:00

2,729 views

Taiji Suzuki (University of Tokyo)
https://simons.berkeley.edu/talks/taiji-suzuki-university-tokyo-2024-12-04
Unknown Futures of Generalization

We introduce recent theoretical development that elucidates the learning capabilities of Transformers, focusing on in-context learning as the main subject. First, regarding statistical efficiency and approximation ability, we show that Transformers can achieve the minimax optimality for in-context learning, and show superiority against non-pretrained methods. Next, in terms of optimization theory, we demonstrate that nonlinear feature learning for in-context learning can be done with optimization guarantee. More concretely, the objective becomes strict-saddle in a mean field setting, and if the target is a single index model, then its computational efficiency can be evaluated based on the information exponent of the true function.

Other Videos By Simons Institute for the Theory of Computing

2025-02-18	Interpreting LLMs to Interpret the Brain
2025-02-18	Language and thought in brains: Implications for AI
2025-02-18	Dissociating language and thought in large language models
2025-02-18	The Cognitive Boundaries of Language Models: Hallucinations and Understanding
2025-02-17	How Do Transformers Learn Variable Binding?
2025-02-06	Automating scientific discovery and hypothesis generation with language model agents
2025-01-26	Resilience in Action
2025-01-24	Boaz Barak \| Polylogues
2024-12-24	Debate: Sparks versus embers
2024-12-24	Off-the-shelf Algorithmic Stability
2024-12-24	Learning Theory of Transformers: Generalization and Optimization of In-Context Learning
2024-12-24	First-Person Fairness in Chatbots
2024-12-24	Panel on the future of scientific research and education
2024-12-24	Generalization in the representations and computations of frontier language models.
2024-12-24	Temporal Context in Brains and AI
2024-12-24	Frame-shifting and Conceptual Blending: What do large language models have to say?
2024-12-24	How machine learning is influencing protein engineering
2024-12-24	Strong generalization from small brains and no training data
2024-12-24	How neural networks learns simple functions?
2024-12-24	The Curious Incident of Developing Artificial General Intelligence
2024-12-24	Fireside Chat

Channel	Latest
Lucas Ishii	7 hours ago
BangCyN	7 hours ago
Ali Külah	7 hours ago
Kel e Laura	7 hours ago
Elite Gaming Enterprise	7 hours ago
SparkyTaccc	8 hours ago
Semangat Bersama	8 hours ago
jmwFILMS	8 hours ago
chenzO	9 hours ago
GabrielMiranda3k	9 hours ago
MySelfAkashH	9 hours ago
곰슬래쉬	9 hours ago
Iyengar and sons (IAS)	9 hours ago
UPEGUIMETAL	9 hours ago
G-REX	9 hours ago
5L1CKGAMEZ	9 hours ago
Rankiro Jd	9 hours ago
Seabusiness Man	9 hours ago
Tyler Tiwan Morgan	9 hours ago
Cold Wind	9 hours ago
TommyKayVODs	9 hours ago
Chris Harkin	9 hours ago
Mega Dads	9 hours ago
Bendy the gaming devil darlin'	9 hours ago
Luh Fernandez	10 hours ago