Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on March 22, 2021 5:26:39 PM ● Video Link: https://www.youtube.com/watch?v=P_xeshTnPZg

Duration: 29:36

49,911 views

1,799

#perceiver #deepmind #transformer

Inspired by the fact that biological creatures attend to multiple modalities at the same time, DeepMind releases its new Perceiver model. Based on the Transformer architecture, the Perceiver makes no assumptions on the modality of the input data and also solves the long-standing quadratic bottleneck problem. This is achieved by having a latent low-dimensional Transformer, where the input data is fed multiple times via cross-attention. The Perceiver's weights can also be shared across layers, making it very similar to an RNN. Perceivers achieve competitive performance on ImageNet and state-of-the-art on other modalities, all while making no architectural adjustments to input data.

OUTLINE:
0:00 - Intro & Overview
2:20 - Built-In assumptions of Computer Vision Models
5:10 - The Quadratic Bottleneck of Transformers
8:00 - Cross-Attention in Transformers
10:45 - The Perceiver Model Architecture & Learned Queries
20:05 - Positional Encodings via Fourier Features
23:25 - Experimental Results & Attention Maps
29:05 - Comments & Conclusion

Paper: https://arxiv.org/abs/2103.03206

My Video on Transformers (Attention is All You Need): https://youtu.be/iDulhoQ2pro

Abstract:
Biological systems understand the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture performs competitively or beyond strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video and video+audio. The Perceiver obtains performance comparable to ResNet-50 on ImageNet without convolutions and by directly attending to 50,000 pixels. It also surpasses state-of-the-art results for all modalities in AudioSet.

Authors: Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-05-04	I'm out of Academia
2021-05-01	DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)
2021-04-30	Why AI is Harder Than We Think (Machine Learning Research Paper Explained)
2021-04-27	I COOKED A RECIPE MADE BY A.I. \| Cooking with GPT-3 (Don't try this at home)
2021-04-19	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)
2021-04-14	I BUILT A NEURAL NETWORK IN MINECRAFT \| Analog Redstone Network w/ Backprop & Optimizer (NO MODS)
2021-04-11	DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
2021-04-07	PAIR AI Explorables \| Is the problem in the data? Examples on Fairness, Diversity, and Bias.
2021-03-30	Machine Learning PhD Survival Guide 2021 \| Advice on Topic Selection, Papers, Conferences & more!
2021-03-23	Is Google Translate Sexist? Gender Stereotypes in Statistical Machine Translation
2021-03-22	Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)
2021-03-16	Pretrained Transformers as Universal Computation Engines (Machine Learning Research Paper Explained)
2021-03-11	Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)
2021-03-06	Apple or iPod??? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model! #Shorts
2021-03-05	Multimodal Neurons in Artificial Neural Networks (w/ OpenAI Microscope, Research Paper Explained)
2021-02-27	GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)
2021-02-26	Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)
2021-02-25	DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)
2021-02-19	Dreamer v2: Mastering Atari with Discrete World Models (Machine Learning Research Paper Explained)
2021-02-17	TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)
2021-02-14	NFNets: High-Performance Large-Scale Image Recognition Without Normalization (ML Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

deep learning tutorial

what is deep learning

introduction to deep learning

deepmind

perceiver

cross attention

attention mechanism

attention is all you need

google deepmind

deepmind perceiver

perceiver model

perciever model

perciever

self attention

rnn

recurrent neural network

weight sharing

computer vision

natural language processing

fourier features

Channel	Latest
Caner Akçay	12 hours ago
whitemoca	12 hours ago
LevelUp Legends	13 hours ago
mariey tv	13 hours ago
Yuichiro Gaming	13 hours ago
RIJEKKK	13 hours ago
상상상상	13 hours ago
69SportTV	13 hours ago
Fandy DS	13 hours ago
SAEROS ID	13 hours ago
Electronics Repair School	13 hours ago
Fuukoji	13 hours ago
PLAYzone	13 hours ago
Pyken	13 hours ago
Ronsmoto Vlog	13 hours ago
Anime~Lover...	13 hours ago
Enigma Nation	13 hours ago
Ahmad Ansari	13 hours ago
Tsubasa Games	13 hours ago
Dandy Dante (sober tom)	13 hours ago
INFINITE	13 hours ago
竜玉 Official	14 hours ago
JUST_GAME	14 hours ago
Platinum Hunter iLmIgLiOrE91	14 hours ago
GHOST CICERO	14 hours ago