Reformer: The Efficient Transformer

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on January 22, 2020 10:01:26 AM ● Video Link: https://www.youtube.com/watch?v=i4H0kjxrias

Duration: 29:12

18,280 views

574

The Transformer for the masses! Reformer solves the biggest problem with the famous Transformer model: Its huge resource requirements. By cleverly combining Locality Sensitive Hashing and ideas from Reversible Networks, the classically huge footprint of the Transformer is drastically reduced. Not only does that mean the model uses less memory, but it can process much longer input sequences, up to 16K tokens with just 16gb of memory!

https://arxiv.org/abs/2001.04451
https://ai.googleblog.com/2020/01/reformer-efficient-transformer.html

Abstract:
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L2) to O(LlogL), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Authors: Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-04-01	State-of-Art-Reviewing: A Radical Proposal to Improve Scientific Publication
2020-03-31	Agent57: Outperforming the Atari Human Benchmark
2020-03-30	Axial Attention & MetNet: A Neural Weather Model for Precipitation Forecasting
2020-03-25	[Rant] coronavirus
2020-03-23	Online Education - How I Make My Videos
2020-02-24	Deep Learning for Symbolic Mathematics
2020-02-21	NeurIPS 2020 Changes to Paper Submission Process
2020-02-12	Growing Neural Cellular Automata
2020-02-11	Turing-NLG, DeepSpeed and the ZeRO optimizer
2020-01-27	[Interview] Mark Ledwich - Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization
2020-01-22	Reformer: The Efficient Transformer
2020-01-10	Go-Explore: a New Approach for Hard-Exploration Problems
2019-12-11	NeurIPS 19 Poster Session
2019-12-10	Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
2019-12-08	NeurIPS 2019
2019-11-21	MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
2019-11-07	A neurally plausible model learns successor representations in partially observable environments
2019-11-03	SinGAN: Learning a Generative Model from a Single Natural Image
2019-11-02	AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
2019-11-01	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
2019-10-31	The Visual Task Adaptation Benchmark

Tags:

deep learning

machine learning

nlp

natural language processing

machine translation

arxiv

google

attention mechanism

attention

transformer

seq2seq

bert

memory

lsh

locality sensitive hashing

reversible

revertible

flow

long sequence

Channel	Latest
Dimitri ASMR	7 hours ago
iTownGamePlay Terror&Diversión	8 hours ago
spikevegeta	10 hours ago
Naphelyn	10 hours ago
ShinLad FGC	11 hours ago
EddboyBlue	11 hours ago
ErralasaGamer's	11 hours ago
Capi Reacts	11 hours ago
Ransix Plays	11 hours ago
AuMiO VXC	11 hours ago
Corner Line Studio	12 hours ago
Tavon B	12 hours ago
floydbishop	12 hours ago
【推し活系VTuber】セツナケイスケ	12 hours ago
Cyrus ꪜ	12 hours ago
Bricks Lair	12 hours ago
KevRow University LIVE	12 hours ago
Mike The Goon	12 hours ago
Oldana Vaverka	12 hours ago
SWK	12 hours ago
Leachim Remsy	12 hours ago
Cult Classic Cage	12 hours ago
Baegger	12 hours ago
Cool Dinosaurs	12 hours ago
PragerU	12 hours ago