TransformerFAM: Feedback attention is working memory

Channel:

Yannic Kilcher

Subscribers:

284,000

Published on April 28, 2024 9:19:49 PM ● Video Link: https://www.youtube.com/watch?v=3a0_hAiFKag

Duration: 37:00

38,518 views

1,053

Paper: https://arxiv.org/abs/2404.09173

Abstract:
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.

Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2024-10-19	GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
2024-10-12	Were RNNs All We Needed? (Paper Explained)
2024-10-05	Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
2024-08-04	Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)
2024-07-08	Scalable MatMul-free Language Modeling (Paper Explained)
2024-06-26	Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)
2024-06-01	xLSTM: Extended Long Short-Term Memory
2024-05-21	[ML News] OpenAI is in hot waters (GPT-4o, Ilya Leaving, Scarlett Johansson legal action)
2024-05-01	ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
2024-04-30	[ML News] Chips, Robots, and Models
2024-04-28	TransformerFAM: Feedback attention is working memory
2024-04-27	[ML News] Devin exposed \| NeurIPS track for high school students
2024-04-24	Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
2024-04-23	[ML News] Llama 3 changes the game
2024-04-17	Hugging Face got hacked
2024-04-15	[ML News] Microsoft to spend 100 BILLION DOLLARS on supercomputer (& more industry news)
2024-04-13	[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)
2024-04-08	Flow Matching for Generative Modeling (Paper Explained)
2024-04-06	Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)
2024-03-26	[ML News] Grok-1 open-sourced \| Nvidia GTC \| OpenAI leaks model names \| AI Act
2024-03-17	[ML News] Devin AI Software Engineer \| GPT-4.5-Turbo LEAKED \| US Gov't Report: Total Extinction

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

Channel	Latest
CHAQN2	13 hours ago
TVape	14 hours ago
Channel ni Abai	14 hours ago
Many A True Nerd	14 hours ago
Calzonzin	14 hours ago
By Ven	14 hours ago
Nekromutante	14 hours ago
Waffle Wings99	15 hours ago
Klosuh	15 hours ago
Caketico	15 hours ago
RykallPedia.	15 hours ago
omgbadplayerttv	15 hours ago
Angry Sonic	15 hours ago
Bluecinante	15 hours ago
Ding Gamer	15 hours ago
Garouzinho	15 hours ago
Markingsco	15 hours ago
Oha Cade	15 hours ago
Lano9	15 hours ago
Enan Vlog	15 hours ago
Aya Wae Channel	15 hours ago
Harean	15 hours ago
RFTV	15 hours ago
Spot World	15 hours ago
グッチ	15 hours ago