SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

301,000

Published on July 5, 2020 3:25:42 PM ● Video Link: https://www.youtube.com/watch?v=qFRfnIRMNlk

Duration: 35:52

10,163 views

470

#machinelearning #ai #google

The high-level architecture of CNNs has not really changed over the years. We tend to build high-resolution low-dimensional layers first, followed by ever more coarse, but deep layers. This paper challenges this decades-old heuristic and uses neural architecture search to find an alternative, called SpineNet that employs multiple rounds of re-scaling and long-range skip connections.

OUTLINE:
0:00 - Intro & Overview
1:00 - Problem Statement
2:30 - The Problem with Current Architectures
8:20 - Scale-Permuted Networks
11:40 - Neural Architecture Search
14:00 - Up- and Downsampling
19:10 - From ResNet to SpineNet
24:20 - Ablations
27:00 - My Idea: Attention Routing for CNNs
29:55 - More Experiments
34:45 - Conclusion & Comments

Papers: https://arxiv.org/abs/1912.05027
Code: https://github.com/tensorflow/tpu/tree/master/models/official/detection

Abstract:
Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. Using similar building blocks, SpineNet models outperform ResNet-FPN models by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, SpineNet-190 achieves 52.5% AP with a MaskR-CNN detector and achieves 52.1% AP with a RetinaNet detector on COCO for a single model without test-time augmentation, significantly outperforms prior art of detectors. SpineNet can transfer to classification tasks, achieving 5% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset. Code is at: this https URL.

Authors: Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

Thumbnail art by Lucas Ferreira

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-07-19	[Classic] Generative Adversarial Networks (Paper Explained)
2020-07-16	[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality
2020-07-14	[Classic] Deep Residual Learning for Image Recognition (Paper Explained)
2020-07-12	I'M TAKING A BREAK... (Channel Update July 2020)
2020-07-11	Deep Ensembles: A Loss Landscape Perspective (Paper Explained)
2020-07-10	Gradient Origin Networks (Paper Explained w/ Live Coding)
2020-07-09	NVAE: A Deep Hierarchical Variational Autoencoder (Paper Explained)
2020-07-08	Addendum for Supermasks in Superposition: A Closer Look (Paper Explained)
2020-07-07	SupSup: Supermasks in Superposition (Paper Explained)
2020-07-06	[Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1
2020-07-05	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)
2020-07-04	Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)
2020-07-03	On the Measure of Intelligence by François Chollet - Part 4: The ARC Challenge (Paper Explained)
2020-07-02	BERTology Meets Biology: Interpreting Attention in Protein Language Models (Paper Explained)
2020-07-01	GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)
2020-06-30	Object-Centric Learning with Slot Attention (Paper Explained)
2020-06-29	Set Distribution Networks: a Generative Model for Sets of Images (Paper Explained)
2020-06-28	Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection (Paper Explained)
2020-06-27	Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (Paper Explained)
2020-06-26	On the Measure of Intelligence by François Chollet - Part 3: The Math (Paper Explained)
2020-06-25	Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

vision

recognition

localization

resnet

resnet50

fpn

backbone

permuation

upsampling

stride

convolution

convolutional neural network

google

spine

spine net

imagenet

coco

segmentation

bounding box

skip connections

residual

bottleneck

Channel	Latest
Wisethug	6 hours ago
Fact-On	6 hours ago
Resisurfer	6 hours ago
TheUltimateHero1	7 hours ago
PuppleStorm	7 hours ago
imNoveria 🎮	7 hours ago
恭一郎のゲーム放送局	7 hours ago
elpierrot17	7 hours ago
TaxOwlbear	7 hours ago
Mandal King07	7 hours ago
HENRI9 CLIPS	7 hours ago
Mark S.Fernandez	7 hours ago
The Escapist	7 hours ago
악령쿤AKTUBE	7 hours ago
Saharul YT	7 hours ago
MESSI GAMING	7 hours ago
Holzbub 66	7 hours ago
Planet Jeep	7 hours ago
Corey Broom	7 hours ago
Hero ヒーロー	7 hours ago
Syster Yster	7 hours ago
Dota 2 Stream - Yaroslav Tekcac	7 hours ago
Quvades Gaming	7 hours ago
MCJr the Daisy Lover & Koopa Kid Hater	8 hours ago
Dido Gaming	8 hours ago