A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
Does self-supervision really need a lot of data? How low can you go? This paper shows that a single image is enough to learn the lower layers of a deep neural network. Interestingly, more data does not appear to help as long as enough data augmentation is applied.
OUTLINE:
0:00 - Overview
1:40 - What is self-supervision
4:20 - What does this paper do
7:00 - Linear probes
11:15 - Linear probe results
17:10 - Results
22:25 - Learned Features
https://arxiv.org/abs/1904.13132
Abstract:
We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset.
Authors: Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
Thumbnail Image: https://commons.wikimedia.org/wiki/File:Golden_Gate_Bridge_during_blue_hour_(16_x_10).jpg
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher