A Theory for Emergence of Complex Skills in Language Models

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,600

Published on August 16, 2023 7:50:52 AM ● Video Link: https://www.youtube.com/watch?v=0D23NeBjCeQ

Duration: 1:04:45

6,186 views

127

Sanjeev Arora (Princeton University)
A Theory for Emergence of Complex Skills in Language Models
Large Language Models and Transformers

A major driver of AI today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this * slingshot generalization* since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving k-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.

Other Videos By Simons Institute for the Theory of Computing

2023-08-17	Meaning in the age of large language models
2023-08-17	Formalizing Explanations of Neural Network Behaviors
2023-08-17	Are Aligned Language Models “Adversarially Aligned”?
2023-08-17	Language Models as Statisticians, and as Adapted Organisms
2023-08-17	On Localization in Language Models
2023-08-17	Panel Discussion
2023-08-16	Large Language Models Meet Copyright Law
2023-08-16	How to Use Self-Play for Language Models to Improve at Solving Programming Puzzles
2023-08-16	Build an Ecosystem, Not a Monolith
2023-08-16	Interpretability via Symbolic Distillation
2023-08-16	A Theory for Emergence of Complex Skills in Language Models
2023-08-16	Scaling Data-Constrained Language Models
2023-08-15	Understanding the Origins and Taxonomy of Neural Scaling Laws
2023-08-15	Panel Discussion
2023-08-14	An observation on Generalization
2023-08-14	Towards Reliable Use of Large Language Models: Better Detection, Consistency, and Instruction-Tuning
2023-08-14	Possible Impossibilities and Impossible Possibilities
2023-08-14	Sparks of Artificial General Intelligence
2023-08-10	Generating Approximate Ground States of Molecules Using Quantum Machine Learning
2023-08-10	Quantum-Classical Cross-Correlations and the Post-selection Problem
2023-08-10	Industry Applications of Hamiltonian Simulation and Beyond

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Large Language Models and Transformers

Sanjeev Arora

Channel	Latest
Danó.	6 hours ago
Wanoi วาโนอิ เกมเมอร์	6 hours ago
GuidingLight	6 hours ago
QuantumFracture	6 hours ago
Hüdaverdi Yılmaz (Hy157)	6 hours ago
VGMusicMachine	6 hours ago
DjGoHam Gaming	6 hours ago
UltimateTobi	6 hours ago
CARIES DOTA	6 hours ago
SuPeR TraDeoS	6 hours ago
tkf28	6 hours ago
RedStark	6 hours ago
エ→ジェント☆みねを・mineoGAMECHANNEL(mineoTV)	6 hours ago
Bands For Hire	6 hours ago
Playstation GamesHd	6 hours ago
Kolash	6 hours ago
アンソニー【ポケポケ実況】	6 hours ago
Labubububu Animation	7 hours ago
Linker	7 hours ago
柚原いづみ / Izumi Channel 【ななしいんく】	7 hours ago
ちゃんこう	7 hours ago
HIRO's channel	7 hours ago
빅헤드	7 hours ago
DIZNEW WARGAMING	7 hours ago
PES FANS FOREVER / VIDEOGAMES FANS FOREVER	7 hours ago