Deep Networks Are Kernel Machines (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on February 4, 2021 2:36:15 PM ● Video Link: https://www.youtube.com/watch?v=ahRPdiCop3E

Duration: 43:04

56,314 views

1,859

#deeplearning #kernels #neuralnetworks

Full Title: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

Deep Neural Networks are often said to discover useful representations of the data. However, this paper challenges this prevailing view and suggest that rather than representing the data, deep neural networks store superpositions of the training data in their weights and act as kernel machines at inference time. This is a theoretical paper with a main theorem and an understandable proof and the result leads to many interesting implications for the field.

OUTLINE:
0:00 - Intro & Outline
4:50 - What is a Kernel Machine?
10:25 - Kernel Machines vs Gradient Descent
12:40 - Tangent Kernels
22:45 - Path Kernels
25:00 - Main Theorem
28:50 - Proof of the Main Theorem
39:10 - Implications & My Comments

Paper: https://arxiv.org/abs/2012.00152
Street Talk about Kernels: https://youtu.be/y_RjsDHl5Y4

ERRATA: I simplify a bit too much when I pit kernel methods against gradient descent. Of course, you can even learn kernel machines using GD, they're not mutually exclusive. And it's also not true that you "don't need a model" in kernel machines, as it usually still contains learned parameters.

Abstract:
Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.

Authors: Pedro Domingos

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-03-11	Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)
2021-03-06	Apple or iPod??? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model! #Shorts
2021-03-05	Multimodal Neurons in Artificial Neural Networks (w/ OpenAI Microscope, Research Paper Explained)
2021-02-27	GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)
2021-02-26	Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)
2021-02-25	DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)
2021-02-19	Dreamer v2: Mastering Atari with Discrete World Models (Machine Learning Research Paper Explained)
2021-02-17	TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)
2021-02-14	NFNets: High-Performance Large-Scale Image Recognition Without Normalization (ML Paper Explained)
2021-02-11	Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)
2021-02-04	Deep Networks Are Kernel Machines (Paper Explained)
2021-02-02	Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)
2021-01-29	SingularityNET - A Decentralized, Open Market and Network for AIs (Whitepaper Explained)
2021-01-22	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-01-17	STOCHASTIC MEME DESCENT - Deep Learning Meme Review - Episode 2 (Part 2 of 2)
2021-01-12	OpenAI CLIP: ConnectingText and Images (Paper Explained)
2021-01-06	OpenAI DALL·E: Creating Images from Text (Blog Post Explained)
2020-12-26	Extracting Training Data from Large Language Models (Paper Explained)
2020-12-24	MEMES IS ALL YOU NEED - Deep Learning Meme Review - Episode 2 (Part 1 of 2)
2020-12-16	ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)
2020-12-13	2M All-In into $5 Pot! WWYD? Daniel Negreanu's No-Limit Hold'em Challenge! (Poker Hand Analysis)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

what is deep learning

deep neural networks

neural networks gradient descent

kernel machines

kernel trick

svm

support vector machine

sgd

stochastic gradient descent

machine learning theory

pedro domingos

linear regression

nearest neighbor

representations

data representations

representation learning

proof

math proof

learning theory

representer theorem

Channel	Latest
Ini Guru Budi	6 hours ago
ALI NAWAZ ONLINE	6 hours ago
Viral Fun Aryan	6 hours ago
Mr Teddyy	6 hours ago
室友Cyo	6 hours ago
Otaku Beats	6 hours ago
Last Error Fixer	6 hours ago
X3pos	6 hours ago
AUTO KRYTYK	6 hours ago
VitoSinagaPrank	6 hours ago
Mettrox Live	6 hours ago
LCK	7 hours ago
* . ଘ buttercupxo ଓ . *	7 hours ago
Lil_murky20	7 hours ago
Chester	7 hours ago
GamerBill	7 hours ago
Đức Mạnh Melody	7 hours ago
Reo Gaming	7 hours ago
John Gage	7 hours ago
Rada	7 hours ago
DolmaKalem	7 hours ago
Momo Candy Gaming	7 hours ago
Left for Мыша	7 hours ago
ゲーム実況あおいさんとこ。	7 hours ago
Dota Kimono	7 hours ago