TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on February 17, 2021 4:02:54 PM ● Video Link: https://www.youtube.com/watch?v=R5DiLFOMZrc

Duration: 29:53

30,632 views

1,022

#transformer #gan #machinelearning

Generative Adversarial Networks (GANs) hold the state-of-the-art when it comes to image generation. However, while the rest of computer vision is slowly taken over by transformers or other attention-based architectures, all working GANs to date contain some form of convolutional layers. This paper changes that and builds TransGAN, the first GAN where both the generator and the discriminator are transformers. The discriminator is taken over from ViT (an image is worth 16x16 words), and the generator uses pixelshuffle to successfully up-sample the generated resolution. Three tricks make training work: Data augmentations using DiffAug, an auxiliary superresolution task, and a localized initialization of self-attention. Their largest model reaches competitive performance with the best convolutional GANs on CIFAR10, STL-10, and CelebA.

OUTLINE:
0:00 - Introduction & Overview
3:05 - Discriminator Architecture
5:25 - Generator Architecture
11:20 - Upsampling with PixelShuffle
15:05 - Architecture Recap
16:00 - Vanilla TransGAN Results
16:40 - Trick 1: Data Augmentation with DiffAugment
19:10 - Trick 2: Super-Resolution Co-Training
22:20 - Trick 3: Locality-Aware Initialization for Self-Attention
27:30 - Scaling Up & Experimental Results
28:45 - Recap & Conclusion

Paper: https://arxiv.org/abs/2102.07074
Code: https://github.com/VITA-Group/TransGAN
My Video on ViT: https://youtu.be/TrdevFK_am4

Abstract:
The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN \textbf{completely free of convolutions}, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed \textbf{TransGAN}, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets \textbf{new state-of-the-art} IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN. The code is available at \url{this https URL}.

Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-03-23	Is Google Translate Sexist? Gender Stereotypes in Statistical Machine Translation
2021-03-22	Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)
2021-03-16	Pretrained Transformers as Universal Computation Engines (Machine Learning Research Paper Explained)
2021-03-11	Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)
2021-03-06	Apple or iPod??? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model! #Shorts
2021-03-05	Multimodal Neurons in Artificial Neural Networks (w/ OpenAI Microscope, Research Paper Explained)
2021-02-27	GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)
2021-02-26	Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)
2021-02-25	DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)
2021-02-19	Dreamer v2: Mastering Atari with Discrete World Models (Machine Learning Research Paper Explained)
2021-02-17	TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)
2021-02-14	NFNets: High-Performance Large-Scale Image Recognition Without Normalization (ML Paper Explained)
2021-02-11	Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)
2021-02-04	Deep Networks Are Kernel Machines (Paper Explained)
2021-02-02	Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)
2021-01-29	SingularityNET - A Decentralized, Open Market and Network for AIs (Whitepaper Explained)
2021-01-22	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-01-17	STOCHASTIC MEME DESCENT - Deep Learning Meme Review - Episode 2 (Part 2 of 2)
2021-01-12	OpenAI CLIP: ConnectingText and Images (Paper Explained)
2021-01-06	OpenAI DALL·E: Creating Images from Text (Blog Post Explained)
2020-12-26	Extracting Training Data from Large Language Models (Paper Explained)

Tags:

deep learning

machine learning

arxiv

neural networks

artificial intelligence

attention neural networks

attention is all you need

transformer gan

transformer gans

transformer generative adversarial network

generative adversarial network

attention mechanism

self attention

vision transformer

pixelshuffle

superresolution

local attention

multihead attention

transformer generator

google

machine learning explained

deep learning explained

paper explained

transgan

Channel	Latest
YaBoyRoshi	9 hours ago
Play Nintendo	10 hours ago
Steam	10 hours ago
PopCross Studios	11 hours ago
Kage848	13 hours ago
Flik's Gaming Stuff	13 hours ago
ArCanOMG	13 hours ago
Sony	13 hours ago
TheREALRandomLozzie!!	14 hours ago
RTGame	16 hours ago
ForceCommander	16 hours ago
Dawko	17 hours ago
MKIceAndFire	17 hours ago
IntroGameOver	17 hours ago
Badaw Gaming	17 hours ago
alanzoka	18 hours ago
oGVexx	18 hours ago
CarbotAnimations	19 hours ago
Akashi	20 hours ago
BanryuTV	20 hours ago
Icehiteru	21 hours ago
raocow	21 hours ago
Grimith	23 hours ago
Caner Akçay	1 day ago
whitemoca	1 day ago