Risk Convergence and Algorithmic Regularization of Discrete-Stepsize (Stochastic) Gradient Descent

Channel:

Simons Institute for the Theory of Computing

Subscribers:

68,700

Published on September 8, 2023 2:25:09 PM ● Video Link: https://www.youtube.com/watch?v=GBvupXn0CJw

Duration: 15:55

201 views

Jingfeng Wu (UC Berkeley)
https://simons.berkeley.edu/talks/jingfeng-wu-uc-berkeley-2023-09-08
Meet the Fellows Welcome Event Fall 2023

Gradient descent (GD) and stochastic gradient descent (SGD) are the fundamental algorithms for optimizing machine learning models, particularly in the context of deep learning. However, certain observed behaviors of GD and SGD cannot be fully explained by classic optimization and statistical learning theories. For example, (1) the training loss induced by GD often oscillates locally yet still converges in the long run and (2) SGD-trained models often generalize well even when the number of training samples is less than the number of parameters. I will discuss two new understandings about the risk convergence and algorithmic regularization effects of GD and SGD:

(1) Large-stepsize GD can minimize risk in a non-monotonic manner for logistic regression with separable data.
(2) Online SGD (and its variant) can effectively learn linear regression and a ReLU neuron in the overparameterized regime.

Other Videos By Simons Institute for the Theory of Computing

2023-09-12	Dynamic Graph Algorithms: What We Know and What We Don’t \| Richard M. Karp Distinguished Lecture
2023-09-08	Probabilistic and Logical Circuits for Tractable Causal Reasoning
2023-09-08	Probabilistic Reasoning and Learning for Trustworthy AI
2023-09-08	Sampling and Approximately Counting CNF Formula Solutions in the Local Lemma Regime
2023-09-08	Massively Parallel Join Algorithms
2023-09-08	Relational Programming
2023-09-08	Title: Research Interests and Latest Works
2023-09-08	What can the demand analyst learn from machine learning?
2023-09-08	From Robustness to Privacy and Back
2023-09-08	Decoding the Maze: New Frontiers in Achieving Nash Equilibrium in AI Architectures
2023-09-08	Risk Convergence and Algorithmic Regularization of Discrete-Stepsize (Stochastic) Gradient Descent
2023-09-08	Quantum Cryptography in Algorithmica
2023-09-08	Noncommutative constraint satisfaction problems
2023-09-08	Faster inverse maintenance for faster conic programming
2023-09-08	Unifying strongly polynomial algorithms for subclasses of Linear Programs
2023-09-08	Fast sparsification via convex optimization and chaining
2023-09-08	Practical Graph Algorithms: Scalability and Privacy
2023-09-08	Recent advances in vertex connectivity
2023-09-08	Shortest paths, dynamic algorithms, and fine-grained complexity
2023-09-08	Recent Trends in Minimum Cut Algorithms
2023-09-05	Introduction to Data Structures and Optimization for Fast Algorithms

Tags:

Simons Institute

theoretical computer science

UC Berkeley

Computer Science

Theory of Computation

Theory of Computing

Meet the Fellows Welcome Event Fall 2023

Jingfeng Wu

Channel	Latest
OPUS ASTORA	6 hours ago
Bring the Asteroid	6 hours ago
Reyju Gaming	6 hours ago
Shravan Srinivasan	7 hours ago
アベレージ / Average Channel	7 hours ago
SUPER SOCCER RUBRO NEGRO -ᄅ-	7 hours ago
The Other Guy	7 hours ago
OMNIxEVIL	7 hours ago
Seer CRZ	7 hours ago
Dreezus	7 hours ago
CANAL JOSÉ MOURA FALANDO FUTEBOL E OUTROS ESPORTES	7 hours ago
DarkXP	8 hours ago
Hanz Meltya Ch.	8 hours ago
Savage Slayer	8 hours ago
Live Stream	8 hours ago
中野あるま / Alma Nakano	8 hours ago
SELECTZ FF	8 hours ago
WolfePack Gaming Den	8 hours ago
jigga876	8 hours ago
Sion Truesilver	8 hours ago
Raging Lumberjack	8 hours ago
Dota2 PERU ez	8 hours ago
BizarreObscure	8 hours ago
savagejesusj	8 hours ago
BlackPearL	8 hours ago