What Functions Do Transformers Prefer to Represent?

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,700

Published on October 8, 2022 5:32:23 AM ● Video Link: https://www.youtube.com/watch?v=HWLF7FO0HW4

Duration: 31:15

868 views

Surbhi Goel (Microsoft Research and University of Pennsylvania)
https://simons.berkeley.edu/talks/stochastic-optimization-under-distributional-drift
Joint IFML/Data-Driven Decision Processes Workshop

Over the past few years, Transformers have revolutionized deep learning, leading to advances in natural language processing and beyond. These models discard recurrence and convolutions, in favor of "self-attention" which directly and globally models interactions within the input context. Despite their success, currently there is limited understanding of why they work. In this talk, I will present our recent results on rigorously quantifying the statistical and representational properties of Transformers which shed light on their ability to capture long range dependencies efficiently. First, I will show how bounded-norm self-attention layers can represent arbitrary sparse functions of the input sequence, with sample complexity scaling only logarithmically with the context length, akin to sparse regression. Subsequently, I will briefly show how this ability of self-attention to compute sparse functions along with its ability to compute averages can be used to construct Transformers that exactly replicate the dynamics of a recurrent model of computation depth $T$ with only $o(T)$ depth. I will conclude the talk with experimental results on synthetic tasks based on learning Boolean functions and automata. Based on joint works with Jordan T. Ash, Ben L. Edelman, Sham M. Kakade, Akshay Krishnamurthy, Bingbin Liu, and Cyril Zhang.

Other Videos By Simons Institute for the Theory of Computing

2022-10-11	The Statistical Complexity of Interactive Decision Making
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...
2022-10-11	Stochastic Bin Packing with Time-Varying Item Sizes
2022-10-10	Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
2022-10-08	On The Exploration In Load-Balancing Under Unknown Service Rates
2022-10-08	Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And ...
2022-10-08	The Compensated Coupling (or Why the Future is the Best Guide for the Present)
2022-10-08	Higher-Dimensional Expansion of Random Geometric Complexes
2022-10-08	On the Power of Preconditioning in Sparse Linear Regression
2022-10-07	What Functions Do Transformers Prefer to Represent?
2022-10-01	Optimality of Variational Inference for Stochastic Block Model
2022-10-01	Machine Learning on Large-Scale Graphs
2022-10-01	Survey on Sparse Graph Limits + A Toy Example
2022-10-01	Long Range Dependence in Evolving Networks
2022-09-30	Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations
2022-09-30	Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of...
2022-09-30	Longitudinal Network Models, Log-Linear Multigraph Models, and Implications to Estimation and...
2022-09-30	Graphon Games
2022-09-30	Vertexons and Embedded Graphon Mean Field Games
2022-09-30	Motif Counting via Subgraph Sampling

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Joint IFML/Data-Driven Decision Processes Workshop

Surbhi Goel

Channel	Latest
아루우	6 hours ago
Nostradamus	6 hours ago
OUDO - ON THE RIFT	6 hours ago
Foxline	6 hours ago
S-Tavo Plays	6 hours ago
Ictfix.net	6 hours ago
Winkazi	6 hours ago
Samanta Gamer	7 hours ago
smskcntr	7 hours ago
Texshanfor Ferdi	7 hours ago
AhtmosTV	7 hours ago
ScarletMarisa375	7 hours ago
OtakuPT	7 hours ago
Koragg Wolzard WolfThunderRangerKilleranger34*	7 hours ago
Insert Coin	7 hours ago
Justmaiko Gaming	7 hours ago
Crainer	7 hours ago
Overdrive	7 hours ago
Adri’s On Fire	7 hours ago
Game Guides Channel	7 hours ago
GemplayTV	7 hours ago
Sveneta	7 hours ago
ImpulseDm	7 hours ago
Is It Playable?	7 hours ago
GrizzoUK	7 hours ago