Hypergradient descent and Universal Probabilistic Programming

Subscribers:
351,000
Published on ● Video Link: https://www.youtube.com/watch?v=CEtMhu_5WFQ



Duration: 1:00:13
1,403 views
48


Online Learning Rate Adaptation with Hypergradient Descent:
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.

Universal Probabilistic Programming in Existing Simulators:
We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.

See more at https://www.microsoft.com/en-us/research/video/hypergradient-descent-and-universal-probabilistic-programming/




Other Videos By Microsoft Research


2020-05-26Large-scale live video analytics over 5G multi-hop camera networks
2020-05-26Kristin Lauter's TED Talk on Private AI at Congreso Futuro during Panel 11 / SOLVE
2020-05-19How an AI agent can balance a pole using a simulation
2020-05-19How to build Intelligent control systems using new tools from Microsoft and simulations by Mathworks
2020-05-13Diving into Deep InfoMax with Dr. Devon Hjelm | Podcast
2020-05-08An Introduction to Graph Neural Networks: Models and Applications
2020-05-07MSR Cambridge Lecture Series: Photonic-chip-based soliton microcombs
2020-05-07Multi-level Optimization Approaches to Computer Vision
2020-05-05How good is your classifier? Revisiting the role of evaluation metrics in machine learning
2020-05-05Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes
2020-05-05Hypergradient descent and Universal Probabilistic Programming
2020-05-04Learning over sets, subgraphs, and streams: How to accurately incorporate graph context
2020-05-04An Ethical Crisis in Computing?
2020-04-21Presentation on “Beyond the Prototype” by Rushil Khurana
2020-04-20Understanding and Improving Database-backed Applications
2020-04-20Efficient Learning from Diverse Sources of Information
2020-04-08Project Orleans and the distributed database future with Dr. Philip Bernstein | Podcast
2020-04-07Reprogramming the American Dream: A conversation with Kevin Scott and J.D. Vance, with Greg Shaw
2020-04-01An interview with Microsoft President Brad Smith | Podcast
2020-03-30Microsoft Rocketbox Avatar library
2020-03-27Virtual reality without vision: A haptic and auditory white cane to navigate complex virtual worlds



Tags:
algortihms
data platforms
gradient-based optimizers
stochastic gradient descent
Hypergradient Descent
probabilistic programming framework
Atılım Güneş Baydin
microsoft research