DETR: End-to-End Object Detection with Transformers (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on May 28, 2020 3:09:01 PM ● Video Link: https://www.youtube.com/watch?v=T35ba_VXkMY

Duration: 40:57

129,991 views

4,757

Object detection in images is a notoriously hard task! Objects can be of a wide variety of classes, can be numerous or absent, they can occlude each other or be out of frame. All of this makes it even more surprising that the architecture in this paper is so simple. Thanks to a clever loss function, a single Transformer stacked on a CNN is enough to handle the entire task!

OUTLINE:
0:00 - Intro & High-Level Overview
0:50 - Problem Formulation
2:30 - Architecture Overview
6:20 - Bipartite Match Loss Function
15:55 - Architecture in Detail
25:00 - Object Queries
31:00 - Transformer Properties
35:40 - Results

ERRATA:
When I introduce bounding boxes, I say they consist of x and y, but you also need the width and height.

My Video on Transformers: https://youtu.be/iDulhoQ2pro

Paper: https://arxiv.org/abs/2005.12872
Blog: https://ai.facebook.com/blog/end-to-end-object-detection-with-transformers/
Code: https://github.com/facebookresearch/detr

Abstract:
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at this https URL.

Authors: Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-07	BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)
2020-06-06	Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)
2020-06-05	CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)
2020-06-04	Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)
2020-06-03	Learning To Classify Images Without Labels (Paper Explained)
2020-06-02	On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)
2020-06-01	Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)
2020-05-31	Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)
2020-05-30	[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)
2020-05-29	GPT-3: Language Models are Few-Shot Learners (Paper Explained)
2020-05-28	DETR: End-to-End Object Detection with Transformers (Paper Explained)
2020-05-27	mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26	A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25	Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24	Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23	[News] The NeurIPS Broader Impact Statement
2020-05-22	When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
2020-05-21	[News] OpenAI Model Generates Python Code
2020-05-20	Investigating Human Priors for Playing Video Games (Paper & Demo)
2020-05-19	iMAML: Meta-Learning with Implicit Gradients (Paper Explained)
2020-05-18	[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

facebook

fair

facebook ai

object detection

coco

bounding boxes

hungarian

matching

bipartite

cnn

transformer

attention

encoder

decoder

images

vision

pixels

segmentation

classes

stuff

things

attention mechanism

squared

unrolled

overlap

threshold

rcnn

Channel	Latest
alanzoka	10 hours ago
Beyond the Brick	12 hours ago
Nintendo Life	14 hours ago
IntroGameOver	15 hours ago
lugeyps3	16 hours ago
CarbotAnimations	17 hours ago
Pixelorez	17 hours ago
Primal Koopa Pictures	17 hours ago
BeastBoyShub	17 hours ago
Chroma	18 hours ago
Unnie Cj	18 hours ago
Brecy	19 hours ago
Renzuwu	19 hours ago
Fal Oval	19 hours ago
fadd game	19 hours ago
Aezwozere	19 hours ago
눈사람	19 hours ago
Fragilistic	19 hours ago
akitokid 青色夜想曲	19 hours ago
soydianagames	19 hours ago
상상상상	19 hours ago
Lucivius	19 hours ago
Ruckquez Nd Stuff	19 hours ago
野武士ノディー	19 hours ago
fan komar	19 hours ago