HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on February 16, 2022 12:21:02 AM ● Video Link: https://www.youtube.com/watch?v=D6osiiEoV0w

Duration: 1:18:17

16,536 views

383

#hypertransformer #metalearning #deeplearning

This video contains a paper explanation and an interview with author Andrey Zhmoginov!
Few-shot learning is an interesting sub-field in meta-learning, with wide applications, such as creating personalized models based on just a handful of data points. Traditionally, approaches have followed the BERT approach where a large model is pre-trained and then fine-tuned. However, this couples the size of the final model to the size of the model that has been pre-trained. Similar problems exist with "true" meta-learners, such as MaML. HyperTransformer fundamentally decouples the meta-learner from the size of the final model by directly predicting the weights of the final model. The HyperTransformer takes the few-shot dataset as a whole into its context and predicts either one or multiple layers of a (small) ConvNet, meaning its output are the weights of the convolution filters. Interestingly, and with the correct engineering care, this actually appears to deliver promising results and can be extended in many ways.

OUTLINE:
0:00 - Intro & Overview
3:05 - Weight-generation vs Fine-tuning for few-shot learning
10:10 - HyperTransformer model architecture overview
22:30 - Why the self-attention mechanism is useful here
34:45 - Start of Interview
39:45 - Can neural networks even produce weights of other networks?
47:00 - How complex does the computational graph get?
49:45 - Why are transformers particularly good here?
58:30 - What can the attention maps tell us about the algorithm?
1:07:00 - How could we produce larger weights?
1:09:30 - Diving into experimental results
1:14:30 - What questions remain open?

Paper: https://arxiv.org/abs/2201.04182

ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov

Abstract:
In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable. Finally, we extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.

Authors: Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-03-02	AlphaCode - with the authors!
2022-03-01	Competition-Level Code Generation with AlphaCode (Paper Review)
2022-02-28	Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)
2022-02-26	Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)
2022-02-23	[ML Olds] Meta Research Supercluster \| OpenAI GPT-Instruct \| Google LaMDA \| Drones fight Pigeons
2022-02-21	Listening to You! - Channel Update (Author Interviews)
2022-02-20	All about AI Accelerators: GPU, TPU, Dataflow, Near-Memory, Optical, Neuromorphic & more (w/ Author)
2022-02-18	[ML News] Uber: Deep Learning for ETA \| MuZero Video Compression \| Block-NeRF \| EfficientNet-X
2022-02-17	CM3: A Causal Masked Multimodal Model of the Internet (Paper Explained w/ Author Interview)
2022-02-16	AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more!
2022-02-15	HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)
2022-02-10	[ML News] DeepMind AlphaCode \| OpenAI math prover \| Meta battles harmful content with AI
2022-02-08	Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (+Author)
2022-02-07	OpenAI Embeddings (and Controversy?!)
2022-02-06	Unsupervised Brain Models - How does Deep Learning inform Neuroscience? (w/ Patrick Mineault)
2022-02-04	GPT-NeoX-20B - Open-Source huge language model by EleutherAI (Interview w/ co-founder Connor Leahy)
2022-01-29	Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)
2022-01-27	IT ARRIVED! YouTube sent me a package. (also: Limited Time Merch Deal)
2022-01-25	[ML News] ConvNeXt: Convolutions return \| China regulates algorithms \| Saliency cropping examined
2022-01-21	Dynamic Inference with Neural Interpreters (w/ author interview)
2022-01-19	Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

metalearning

meta learning

neural network

unsupervised learning

few shot learning

google

google research

google ai

transformer

meta transformer

hypertransformer

hyper transformer

generate the weights of a neural network

privacy

personalization

interview

paper explained

semi-supervised learning

Channel	Latest
HellfireComms	6 hours ago
Svarush	9 hours ago
Õhtuleht	9 hours ago
Pico Shogun	9 hours ago
Momoterasu	10 hours ago
Bass City	10 hours ago
ETwo4Three	10 hours ago
Henry Chhouk	10 hours ago
TueurDeBikette	10 hours ago
Suns	10 hours ago
Mati Clips	10 hours ago
Carlotta ASMR	10 hours ago
Shazam Sakazaki	10 hours ago
Cardboard Tube Knight	10 hours ago
ÉducaTube	10 hours ago
Jaegerchere	10 hours ago
lucas gameplays	10 hours ago
Darth Luke	11 hours ago
Ajarn Spencer	11 hours ago
Lazycorner07	11 hours ago
Christopher Leon Johnson	11 hours ago
Интроверт развлекает	11 hours ago
OPUS ASTORA	11 hours ago
Naikurio	11 hours ago
ابوعيد AbuEid	11 hours ago