Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression

Published on ● Video Link: https://www.youtube.com/watch?v=Gag7H4M-GdQ



Duration: 1:05:52
1,251 views
13


Surya Ganguli (Stanford University

https://simons.berkeley.edu/talks/surya-ganguli-stanford-university-2023-08-18

Large Language Models and Transformers

Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating anyweights.This raises a foundational question: can ICL solve fundamentally new tasks that are very different from those seen during pretraining? To probe this question, we examine ICL's performance on linear regression while varying the diversity of tasks in the pretraining dataset. We empirically demonstrate a task diversity thresholdfor the emergence of ICL. Below this threshold, the pretrained transformer cannot solve unseen regression tasks as it behaves like a Bayesian estimator with theon-diverse pretraining task distributionas the prior. Beyond this threshold, the transformer significantly outperforms this estimator; its behavior aligns with that of ridge regression, corresponding to a Gaussian prior over all tasks, including those not seen during pretraining. These results highlight that, when pretrained on data with task diversity greater than the threshold, transformers can solve fundamentally new tasks in-context. Importantly, this capability hinges on it deviating from the Bayes optimal estimator with the pretraining distribution as the prior. This study underscores, in a concrete example, the critical role of task diversity, alongside data and model scale, in the emergence of ICL. Code is available at this https URL.







Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Large Language Models and Transformers
Surya Ganguli