Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on May 24, 2021 12:24:30 PM ● Video Link: https://www.youtube.com/watch?v=2PYLNHqxd5A

Duration: 41:45

10,453 views

373

#expirespan #nlp #facebookai

Facebook AI (FAIR) researchers present Expire-Span, a variant of Transformer XL that dynamically assigns expiration dates to previously encountered signals. Because of this, Expire-Span can handle sequences of many thousand tokens, while keeping the memory and compute requirements at a manageable level. It severely matches or outperforms baseline systems, while consuming much less resources. We discuss its architecture, advantages, and shortcomings.

OUTLINE:
0:00 - Intro & Overview
2:30 - Remembering the past in sequence models
5:45 - Learning to expire past memories
8:30 - Difference to local attention
10:00 - Architecture overview
13:45 - Comparison to Transformer XL
18:50 - Predicting expiration masks
32:30 - Experimental Results
40:00 - Conclusion & Comments

Paper: https://arxiv.org/abs/2105.06548
Code: https://github.com/facebookresearch/transformer-sequential

ADDENDUM: I mention several times that the gradient signal of the e quantity only occurs inside the R ramp. By that, I mean the gradient stemming from the model loss. The regularization loss acts also outside the R ramp.

Abstract:
Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant information. This forgetting of memories enables Transformers to scale to attend over tens of thousands of previous timesteps efficiently, as not all states from previous timesteps are preserved. We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality. Next, we show that Expire-Span can scale to memories that are tens of thousands in size, setting a new state of the art on incredibly long context tasks such as character-level language modeling and a frame-by-frame moving objects task. Finally, we analyze the efficiency of Expire-Span compared to existing approaches and demonstrate that it trains faster and uses less memory.

Authors: Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-06-16	[ML News] De-Biasing GPT-3 \| RL cracks chip design \| NetHack challenge \| Open-Source GPT-J
2021-06-11	Efficient and Modular Implicit Differentiation (Machine Learning Research Paper Explained)
2021-06-09	[ML News] EU regulates AI, China trains 1.75T model, Google's oopsie, Everybody cheers for fraud.
2021-06-08	My GitHub (Trash code I wrote during PhD)
2021-06-05	Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained)
2021-06-02	[ML News] Anthropic raises $124M, ML execs clueless, collusion rings, ELIZA source discovered & more
2021-05-31	Reward Is Enough (Machine Learning Research Paper Explained)
2021-05-30	[Rant] Can AI read your emotions? (No, but ...)
2021-05-29	Fast and Slow Learning of Recurrent Independent Mechanisms (Machine Learning Paper Explained)
2021-05-26	[ML News] DeepMind fails to get independence from Google
2021-05-24	Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)
2021-05-21	FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)
2021-05-18	AI made this music video \| What happens when OpenAI's CLIP meets BigGAN?
2021-05-15	DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)
2021-05-11	Research Conference ICML drops their acceptance rate \| Area Chairs instructed to be more picky
2021-05-08	Involution: Inverting the Inherence of Convolution for Visual Recognition (Research Paper Explained)
2021-05-06	MLP-Mixer: An all-MLP Architecture for Vision (Machine Learning Research Paper Explained)
2021-05-04	I'm out of Academia
2021-05-01	DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)
2021-04-30	Why AI is Harder Than We Think (Machine Learning Research Paper Explained)
2021-04-27	I COOKED A RECIPE MADE BY A.I. \| Cooking with GPT-3 (Don't try this at home)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

expire span

facebook ai

transformers

long sequence models

transformers long sequence

large context language models

language model sequence length

transformer xl

learning to forget

lstm

schmidhuber

learning to remember

not all memories are created equal

linear attention

attention mechanism

linear attention mechanism

transformer memory

deep learning tutorial

Channel	Latest
Merg	11 hours ago
ZellenDust	12 hours ago
Zanar Aesthetics	12 hours ago
EmaNG91	12 hours ago
Toronto Marlies	12 hours ago
Rincón de jugones	12 hours ago
Mandenmoris A.	12 hours ago
ThA NaTiOn T3 Tv FaBDiCeMaN	12 hours ago
CaptainFRACAS	12 hours ago
jester_VII	12 hours ago
RTV Dukagjini	12 hours ago
ennohex	12 hours ago
NeoEk Channel	12 hours ago
fenom	12 hours ago
Lazycorner07	12 hours ago
EmiRóża89 The Playerka	12 hours ago
MePlayingGTA	12 hours ago
Hyun's Dojo Community	12 hours ago
Captain Oats	12 hours ago
圍棋愛好者	12 hours ago
Thinknoodles	12 hours ago
Spider Shark	12 hours ago
Daizo Dee Von	12 hours ago
Dan Toppy	13 hours ago
CJR Gaming	13 hours ago