Hypergradient descent and Universal Probabilistic Programming

Channel:

Subscribers:

351,000

Published on May 5, 2020 8:32:47 PM ● Video Link: https://www.youtube.com/watch?v=CEtMhu_5WFQ

Duration: 1:00:13

1,403 views

Online Learning Rate Adaptation with Hypergradient Descent:
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.

Universal Probabilistic Programming in Existing Simulators:
We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.

See more at https://www.microsoft.com/en-us/research/video/hypergradient-descent-and-universal-probabilistic-programming/

Other Videos By Microsoft Research

2020-05-26	Large-scale live video analytics over 5G multi-hop camera networks
2020-05-26	Kristin Lauter's TED Talk on Private AI at Congreso Futuro during Panel 11 / SOLVE
2020-05-19	How an AI agent can balance a pole using a simulation
2020-05-19	How to build Intelligent control systems using new tools from Microsoft and simulations by Mathworks
2020-05-13	Diving into Deep InfoMax with Dr. Devon Hjelm \| Podcast
2020-05-08	An Introduction to Graph Neural Networks: Models and Applications
2020-05-07	MSR Cambridge Lecture Series: Photonic-chip-based soliton microcombs
2020-05-07	Multi-level Optimization Approaches to Computer Vision
2020-05-05	How good is your classifier? Revisiting the role of evaluation metrics in machine learning
2020-05-05	Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes
2020-05-05	Hypergradient descent and Universal Probabilistic Programming
2020-05-04	Learning over sets, subgraphs, and streams: How to accurately incorporate graph context
2020-05-04	An Ethical Crisis in Computing?
2020-04-21	Presentation on “Beyond the Prototype” by Rushil Khurana
2020-04-20	Understanding and Improving Database-backed Applications
2020-04-20	Efficient Learning from Diverse Sources of Information
2020-04-08	Project Orleans and the distributed database future with Dr. Philip Bernstein \| Podcast
2020-04-07	Reprogramming the American Dream: A conversation with Kevin Scott and J.D. Vance, with Greg Shaw
2020-04-01	An interview with Microsoft President Brad Smith \| Podcast
2020-03-30	Microsoft Rocketbox Avatar library
2020-03-27	Virtual reality without vision: A haptic and auditory white cane to navigate complex virtual worlds

Tags:

algortihms

data platforms

gradient-based optimizers

stochastic gradient descent

Hypergradient Descent

probabilistic programming framework

Atılım Güneş Baydin

microsoft research

Channel	Latest
Simple Gamer	6 hours ago
RedCaio	6 hours ago
ROXMAN GAMING	6 hours ago
Dem2006	6 hours ago
A TUTTO CALCIO⚽	6 hours ago
Haloist	6 hours ago
SKILL DOWN1982	6 hours ago
Zaxx Gaming	6 hours ago
Mystère Alex	6 hours ago
LEO DESANDE E ANA CLÁUDIA	6 hours ago
Starzkil1z	6 hours ago
rickX lods official	6 hours ago
WraggyTheGamer	6 hours ago
Böröcz "DeadFox" Bence	6 hours ago
Joey Fernandez	6 hours ago
六神说漫	6 hours ago
Ananda Husain	6 hours ago
Drachinifel	6 hours ago
UmmeBlox	6 hours ago
Hutton	6 hours ago
CANAL DO MARCIO 🎮🕹	6 hours ago
なすななし	6 hours ago
COSEF NASTYA	6 hours ago
Elykhull	6 hours ago
จุ่มค่ะ มากับนุ่นแล้วก็มากับโบว์	6 hours ago