Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,600

Published on August 19, 2023 6:22:38 AM ● Video Link: https://www.youtube.com/watch?v=Gag7H4M-GdQ

Duration: 1:05:52

1,251 views

Surya Ganguli (Stanford University

https://simons.berkeley.edu/talks/surya-ganguli-stanford-university-2023-08-18

Large Language Models and Transformers

Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating anyweights.This raises a foundational question: can ICL solve fundamentally new tasks that are very different from those seen during pretraining? To probe this question, we examine ICL's performance on linear regression while varying the diversity of tasks in the pretraining dataset. We empirically demonstrate a task diversity thresholdfor the emergence of ICL. Below this threshold, the pretrained transformer cannot solve unseen regression tasks as it behaves like a Bayesian estimator with theon-diverse pretraining task distributionas the prior. Beyond this threshold, the transformer significantly outperforms this estimator; its behavior aligns with that of ridge regression, corresponding to a Gaussian prior over all tasks, including those not seen during pretraining. These results highlight that, when pretrained on data with task diversity greater than the threshold, transformers can solve fundamentally new tasks in-context. Importantly, this capability hinges on it deviating from the Bayes optimal estimator with the pretraining distribution as the prior. This study underscores, in a concrete example, the critical role of task diversity, alongside data and model scale, in the emergence of ICL. Code is available at this https URL.

Other Videos By Simons Institute for the Theory of Computing

2023-08-23	Logic and Probabilistic Circuits 2
2023-08-22	Logic and Probabilistic Circuits 1
2023-08-22	Lightning Talks
2023-08-22	Query Optimization and Evaluation 4: Instance Optimality
2023-08-22	Query Optimization and Evaluation 3: Variable Elimination and Tensor Decomposition
2023-08-22	Query Optimization and Evaluation 2: Worst-Case Optimal Join Algorithms
2023-08-21	Query Optimization and Evaluation 1: Output Size Bounds and Information Theory
2023-08-19	Short Talks
2023-08-19	Short Talks
2023-08-19	A data-centric view on reliable generalization: From ImageNet to LAION-5B
2023-08-19	Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression
2023-08-18	In-Context Learning: A Case Study of Simple Function Classes
2023-08-18	Watermarking of Large Language Models
2023-08-18	Human-AI Interaction in the Age of Large Language Models
2023-08-18	Are LLMs the Beginning or End of NLP?
2023-08-18	Beyond Language: Scaling up Robot Ontogeny
2023-08-18	Integrating Language into Intelligent Architectures
2023-08-17	Meaning in the age of large language models
2023-08-17	Formalizing Explanations of Neural Network Behaviors
2023-08-17	Are Aligned Language Models “Adversarially Aligned”?
2023-08-17	Language Models as Statisticians, and as Adapted Organisms

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Large Language Models and Transformers

Surya Ganguli

Channel	Latest
Eric Kurosaki	6 hours ago
ChratosGameplay	6 hours ago
KreekCraft	7 hours ago
Domtendo	7 hours ago
Sirloin	7 hours ago
Ivan Espinoza	7 hours ago
Marstead	7 hours ago
YumiYT	7 hours ago
Zombey	7 hours ago
Singeferno	7 hours ago
Matt212 Clips	7 hours ago
NintendoGamerGuide	7 hours ago
Cole Calamello Ch.	7 hours ago
Dizowskyy	7 hours ago
LadyShelab	7 hours ago
DooM49	7 hours ago
Alex Gillon	7 hours ago
Marcy	8 hours ago
Proximus Prime	8 hours ago
Gryphox: Tails_155, ShinMajin and Pals	8 hours ago
Quasimofo	8 hours ago
Scroft	8 hours ago
ZGF Gaming	8 hours ago
Studio MagicA	8 hours ago
B-Lo	8 hours ago