FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on May 21, 2021 7:21:42 PM ● Video Link: https://www.youtube.com/watch?v=JJR3pBl78zw

Duration: 34:23

25,656 views

896

#fnet #attention #fourier

Do we even need Attention? FNets completely drop the Attention mechanism in favor of a simple Fourier transform. They perform almost as well as Transformers, while drastically reducing parameter count, as well as compute and memory requirements. This highlights that a good token mixing heuristic could be as valuable as a learned attention matrix.

OUTLINE:
0:00 - Intro & Overview
0:45 - Giving up on Attention
5:00 - FNet Architecture
9:00 - Going deeper into the Fourier Transform
11:20 - The Importance of Mixing
22:20 - Experimental Results
33:00 - Conclusions & Comments

Paper: https://arxiv.org/abs/2105.03824

ADDENDUM:
Of course, I completely forgot to discuss the connection between Fourier transforms and Convolutions, and that this might be interpreted as convolutions with very large kernels.

Abstract:
We show that Transformer encoder architectures can be massively sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens. These linear transformations, along with simple nonlinearities in feed-forward layers, are sufficient to model semantic relationships in several text classification tasks. Perhaps most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92% of the accuracy of BERT on the GLUE benchmark, but pre-trains and runs up to seven times faster on GPUs and twice as fast on TPUs. The resulting model, which we name FNet, scales very efficiently to long inputs, matching the accuracy of the most accurate "efficient" Transformers on the Long Range Arena benchmark, but training and running faster across all sequence lengths on GPUs and relatively shorter sequence lengths on TPUs. Finally, FNet has a light memory footprint and is particularly efficient at smaller model sizes: for a fixed speed and accuracy budget, small FNet models outperform Transformer counterparts.

Authors: James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-06-11	Efficient and Modular Implicit Differentiation (Machine Learning Research Paper Explained)
2021-06-09	[ML News] EU regulates AI, China trains 1.75T model, Google's oopsie, Everybody cheers for fraud.
2021-06-08	My GitHub (Trash code I wrote during PhD)
2021-06-05	Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained)
2021-06-02	[ML News] Anthropic raises $124M, ML execs clueless, collusion rings, ELIZA source discovered & more
2021-05-31	Reward Is Enough (Machine Learning Research Paper Explained)
2021-05-30	[Rant] Can AI read your emotions? (No, but ...)
2021-05-29	Fast and Slow Learning of Recurrent Independent Mechanisms (Machine Learning Paper Explained)
2021-05-26	[ML News] DeepMind fails to get independence from Google
2021-05-24	Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)
2021-05-21	FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)
2021-05-18	AI made this music video \| What happens when OpenAI's CLIP meets BigGAN?
2021-05-15	DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)
2021-05-11	Research Conference ICML drops their acceptance rate \| Area Chairs instructed to be more picky
2021-05-08	Involution: Inverting the Inherence of Convolution for Visual Recognition (Research Paper Explained)
2021-05-06	MLP-Mixer: An all-MLP Architecture for Vision (Machine Learning Research Paper Explained)
2021-05-04	I'm out of Academia
2021-05-01	DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)
2021-04-30	Why AI is Harder Than We Think (Machine Learning Research Paper Explained)
2021-04-27	I COOKED A RECIPE MADE BY A.I. \| Cooking with GPT-3 (Don't try this at home)
2021-04-19	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

what is deep learning

deep learning tutorial

introduction to deep learning

fnet

fnets

fourier nets

fourier neural networks

attention fourier

fourier attention

deep learning fft

machine learning fft

deep learning fourier transform

attention mechanism fourier transform

fourier transform in deep learning

attention networks

do we need attention in deep learning

Channel	Latest
Lithia Toyota of Redding	6 hours ago
Lyna	6 hours ago
Driving Sports TV	6 hours ago
Akamatzu	6 hours ago
Winding Road Magazine	7 hours ago
Sanford INFINITI	7 hours ago
Cleysson Gamer	7 hours ago
TV Sul	7 hours ago
AbsintoJ	7 hours ago
Canal 21 Ebre	7 hours ago
Gameplay Arena	7 hours ago
Kevin Balázs	7 hours ago
TH+ SBT Interior	7 hours ago
Lizzie and Koala Skywalker	7 hours ago
TheGamerHennyRoc	7 hours ago
Framianos	7 hours ago
Gaba	7 hours ago
Mediotiempo	7 hours ago
Prithwiraj Ghosh	7 hours ago
RoMike2013	7 hours ago
Vanhellsing TV	7 hours ago
Light Tukiakari Ch.	8 hours ago
Big Trend Show	8 hours ago
Pickup Truck Plus SUV Talk	8 hours ago
Savage Sam 43 Gaming	8 hours ago