Object-Centric Learning with Slot Attention (Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=DYBmD88vpiA



Duration: 42:39
14,331 views
538


Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at. By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in both supervised and unsupervised settings. It does so by introducing a slot attention module that combines an attention mechanism with dynamic routing.

OUTLINE:
0:00 - Intro & Overview
1:40 - Problem Formulation
4:30 - Slot Attention Architecture
13:30 - Slot Attention Algorithm
21:30 - Iterative Routing Visualization
29:15 - Experiments
36:20 - Inference Time Flexibility
38:35 - Broader Impact Statement
42:05 - Conclusion & Comments

Paper: https://arxiv.org/abs/2006.15055

My Video on Facebook's DETR: https://youtu.be/T35ba_VXkMY
My Video on Attention: https://youtu.be/iDulhoQ2pro
My Video on Capsules: https://youtu.be/nXGHJTtFYRU

Abstract:
Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks.

Authors: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher




Other Videos By Yannic Kilcher


2020-07-10Gradient Origin Networks (Paper Explained w/ Live Coding)
2020-07-09NVAE: A Deep Hierarchical Variational Autoencoder (Paper Explained)
2020-07-08Addendum for Supermasks in Superposition: A Closer Look (Paper Explained)
2020-07-07SupSup: Supermasks in Superposition (Paper Explained)
2020-07-06[Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1
2020-07-05SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)
2020-07-04Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)
2020-07-03On the Measure of Intelligence by François Chollet - Part 4: The ARC Challenge (Paper Explained)
2020-07-02BERTology Meets Biology: Interpreting Attention in Protein Language Models (Paper Explained)
2020-07-01GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)
2020-06-30Object-Centric Learning with Slot Attention (Paper Explained)
2020-06-29Set Distribution Networks: a Generative Model for Sets of Images (Paper Explained)
2020-06-28Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection (Paper Explained)
2020-06-27Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (Paper Explained)
2020-06-26On the Measure of Intelligence by François Chollet - Part 3: The Math (Paper Explained)
2020-06-25Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)
2020-06-24How I Read a Paper: Facebook's DETR (Video Tutorial)
2020-06-23RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)
2020-06-22[Drama] Yann LeCun against Twitter on Dataset Bias
2020-06-21SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)
2020-06-20Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
google
ethz
vision
objects
slots
attention mechanism
gru
lstm
routing
capsules
permutation invariant
encoder
set
detr
embeddings
transformer
weight sharing
disentanglement
render
tetris
clevr
cnn
convolutional neural network
attention