∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on September 6, 2021 12:07:08 PM ● Video Link: https://www.youtube.com/watch?v=0JlB9gufTw8

Duration: 36:37

30,338 views

836

#inftyformer #infinityformer #transformer

Vanilla Transformers are excellent sequence models, but suffer from very harsch constraints on the length of the sequences they can process. Several attempts have been made to extend the Transformer's sequence length, but few have successfully gone beyond a constant factor improvement. This paper presents a method, based on continuous attention mechanisms, to attend to an unbounded past sequence by representing the past as a continuous signal, rather than a sequence. This enables the Infty-Former to effectively enrich the current context with global information, which increases performance on long-range dependencies in sequence tasks. Further, the paper presents the concept of sticky memories, which highlight past events that are of particular importance and elevates their representation in the long-term memory.

OUTLINE:
0:00 - Intro & Overview
1:10 - Sponsor Spot: Weights & Biases
3:35 - Problem Statement
8:00 - Continuous Attention Mechanism
16:25 - Unbounded Memory via concatenation & contraction
18:05 - Does this make sense?
20:25 - How the Long-Term Memory is used in an attention layer
27:40 - Entire Architecture Recap
29:30 - Sticky Memories by Importance Sampling
31:25 - Commentary: Pros and cons of using heuristics
32:30 - Experiments & Results

Paper: https://arxiv.org/abs/2109.00301

Sponsor: Weights & Biases
https://wandb.me/start

Abstract:
Transformers struggle when attending to long contexts, since the amount of computation grows with the context length, and therefore they cannot model long-term memories effectively. Several variations have been proposed to alleviate this problem, but they all have a finite memory capacity, being forced to drop old information. In this paper, we propose the ∞-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the ∞-former's attention complexity becomes independent of the context length. Thus, it is able to model arbitrarily long contexts and maintain "sticky memories" while keeping a fixed computation budget. Experiments on a synthetic sorting task demonstrate the ability of the ∞-former to retain information from long sequences. We also perform experiments on language modeling, by training a model from scratch and by fine-tuning a pre-trained language model, which show benefits of unbounded long-term memories.

Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-10-02	How far can we scale up? Deep Learning's Diminishing Returns (Article Review)
2021-09-29	[ML News] Plagiarism Case w/ Plot Twist \| CLIP for video surveillance \| OpenAI summarizes books
2021-09-27	Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment (Paper Explained)
2021-09-26	100K Subs AMA (Ask Me Anything)
2021-09-24	[ML News] New ImageNet SOTA \| Uber's H3 hexagonal coordinate system \| New text-image-pair dataset
2021-09-21	Does GPT-3 lie? - Misinformation and fear-mongering around the TruthfulQA dataset
2021-09-20	Topographic VAEs learn Equivariant Capsules (Machine Learning Research Paper Explained)
2021-09-16	[ML News] Roomba Avoids Poop \| Textless NLP \| TikTok Algorithm Secrets \| New Schmidhuber Blog
2021-09-14	Celebrating 100k Subscribers! (w/ Channel Statistics)
2021-09-10	[ML News] AI predicts race from X-Ray \| Google kills HealthStreams \| Boosting Search with MuZero
2021-09-06	∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)
2021-09-03	[ML News] Blind Chess AI Competition \| Graph NNs for traffic \| AI gift suggestions
2021-09-02	ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation
2021-08-27	[ML News] Stanford HAI coins Foundation Models & High-profile case of plagiarism uncovered
2021-08-26	Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)
2021-08-23	PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)
2021-08-19	NeuralHash is BROKEN - How to evade Apple's detection & craft hash collisions (w/ Open Source Code)
2021-08-18	[ML News] Nvidia renders CEO \| Jurassic-1 larger than GPT-3 \| Tortured Phrases reveal Plagiarism
2021-08-16	How Apple scans your phone (and how to evade it) - NeuralHash CSAM Detection Algorithm Explained
2021-08-13	[ML NEWS] Apple scans your phone \| Master Faces beat face recognition \| WALL-E is real
2021-08-06	[ML News] AI-generated patent approved \| Germany gets an analog to OpenAI \| ML cheats video games

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

inftyformer

infinityformer

infty former

infinity former

transformer

transformers

transformer linear

linear attention

unbounded memory transformer

continuous attention

attention mechanism

continuous attention mechanism

radial basis function

radial basis functions

ridge regression

long term memory

long term memory explained

Channel	Latest
ketsueki_randi	6 hours ago
Secretnc	6 hours ago
Lithia Toyota of Redding	6 hours ago
Lyna	6 hours ago
Ludophiles	7 hours ago
Allan Channel	7 hours ago
Bluetufie	7 hours ago
Tripulante	7 hours ago
Driving Sports TV	7 hours ago
Akamatzu	7 hours ago
SmileyFlowerTV	7 hours ago
Winding Road Magazine	7 hours ago
Sanford INFINITI	7 hours ago
Cleysson Gamer	7 hours ago
kson ONAIR	7 hours ago
TV Sul	7 hours ago
Ryan Cole	7 hours ago
AbsintoJ	7 hours ago
Wyzcorn	7 hours ago
Canal 21 Ebre	7 hours ago
Gameplay Arena	8 hours ago
Kevin Balázs	8 hours ago
TH+ SBT Interior	8 hours ago
Lizzie and Koala Skywalker	8 hours ago
TheGamerHennyRoc	8 hours ago