Reformer: The Efficient Transformer

Subscribers:
291,000
Published on ● Video Link: https://www.youtube.com/watch?v=i4H0kjxrias



Duration: 29:12
18,280 views
574


The Transformer for the masses! Reformer solves the biggest problem with the famous Transformer model: Its huge resource requirements. By cleverly combining Locality Sensitive Hashing and ideas from Reversible Networks, the classically huge footprint of the Transformer is drastically reduced. Not only does that mean the model uses less memory, but it can process much longer input sequences, up to 16K tokens with just 16gb of memory!

https://arxiv.org/abs/2001.04451
https://ai.googleblog.com/2020/01/reformer-efficient-transformer.html

Abstract:
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L2) to O(LlogL), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Authors: Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher







Tags:
deep learning
machine learning
nlp
natural language processing
machine translation
arxiv
google
attention mechanism
attention
transformer
seq2seq
bert
memory
lsh
locality sensitive hashing
reversible
revertible
flow
long sequence