Theoretical Limitations of Multi layer Transformers

Channel:

Google TechTalks

Subscribers:

349,000

Published on February 11, 2025 1:06:38 AM ● Video Link: https://www.youtube.com/watch?v=Pfn-wbyamcU

Duration: 0:00

1,085 views

A Google TechTalk, presented by Binghui Peng, 2025-02-06
Algorithms Seminar ABSTRACT: Transformers, especially the decoder-only variants, are the backbone of most modern large language models; yet we do not have much understanding of their expressive power except for the simple 1-layer case. Due to the difficulty of analyzing multi-layer models, all previous work relies on unproven complexity conjectures to show limitations for multi-layer Transformers. In this work, we prove the first unconditional lower bound against multi-layer decoder-only transformers. For any constant $L$, we prove that any $L$-layer decoder-only transformer needs a polynomial model dimension ($n^{\Omega(1)}$) to perform sequential composition of $L$ functions over an input of $n$ tokens.

As a consequence, our results give: (1) the first (unconditional) depth-size trade-off for multi-layer transformers, exhibiting that the $L$-step composition task is exponentially harder for $L$-layer models compared to $(L+1)$-layer ones; (2) an unconditional separation between encoder and decoder, exhibiting a hard task for decoders that can be solved by an exponentially shallower and smaller encoder; (3) a provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought.

On the technical side, we propose the multi-party autoregressive communication model that captures the computation of a decoder-only Transformer. We also introduce a new proof technique that finds a certain indistinguishable decomposition of all possible inputs iteratively for proving lower bounds in this model. We believe our new communication model and proof technique will be helpful to further understand the computational power of transformers.

Based on joint work with Lijie Chen and Hongxun Wu

ABOUT THE SPEAKER: Binghui Peng is a motwani postdoc at Stanford University, working with Aviad Rubinstein and Amin Saberi. Previously, he was a research fellow at Simons Institute on "Large Language Models and Transformers" program. He obtained his Ph.D. from Columbia University, advised by Xi Chen and Christos Papadimitriou. He has worked on learning theory and game theory, and, most recently, large language models.

Other Videos By Google TechTalks

2025-04-15	Online Learning and Economics
2025-04-14	Go Meetup April 2025 - i18n Go Experiment
2025-04-14	Go Meetup April 2025 - Whats New in Go 1.24?
2025-04-14	Go Meetup April 2025 - Git Bisect and Go Size Analyzer
2025-04-14	Go Meetup April 2025 - Photobooth
2025-04-14	Go Meetup April 2025 - Go Protobuf
2025-02-24	Understanding LLMs Like Physicists: Observation, Hypothesis, Experimentation, and Prediction
2025-02-10	Theoretical Limitations of Multi layer Transformers
2025-01-28	Hash Functions: Bridging the Gap from Theory to Practice
2025-01-14	LLM Dataset Inference: Did you train on my dataset?
2024-12-10	Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
2024-11-22	AI Snake Oil
2024-08-15	How I Wrote 10K Lines of Go in a Weekend
2024-08-15	Supply Chain Security with Go
2024-07-30	A Multi Dimensional Online Contention Resolution Scheme
2024-07-09	Robust Distortion-free Watermarks for Language Models
2024-07-02	Is it possible to make self-adjusting data structures concurrent?
2024-06-21	Privacy Preserving ML with Fully Homomorphic Encryption
2024-06-21	The Chinese Computer: A Global History of the Information Age
2024-06-14	KAN: Kolmogorov-Arnold Networks
2024-05-27	Learning through Transient Matching in Congested Markets

Channel	Latest
PryGames	6 hours ago
Edi Solo Gaming	6 hours ago
BIGAME	6 hours ago
João Maluco2	6 hours ago
Darker Senpai	6 hours ago
Korea Retro Game	6 hours ago
Real Betis Balompié	6 hours ago
TRC Gameplay	6 hours ago
GENIAL	6 hours ago
Cryptobruj	6 hours ago
Dota 2 - Akon. tv	6 hours ago
MultiSt3p	7 hours ago
Beatriz Plays	7 hours ago
Neuro	7 hours ago
Ace London	7 hours ago
TacoGrande	7 hours ago
Cahlaflour	7 hours ago
Subroza	7 hours ago
ChrisTheNjord	7 hours ago
MrEgor	7 hours ago
Strumienie z Ruczaju	7 hours ago
GrimJesuz Iam MrGrim	7 hours ago
lucasburn	7 hours ago
Brodaty	7 hours ago
LiN	7 hours ago