Research talk: Breaking the deadly triad with a target network

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=4QxrLGsNj5I



Duration: 9:52
530 views
0


Speaker: Shangtong Zhang, PhD Student, Oxford University

The deadly triad refers to the instability of an off-policy reinforcement learning (RL) algorithm when it employs function approximation and bootstrapping simultaneously, and this is a major challenge in off-policy RL. Join PhD student Shangtong Zhang, from the WhiRL group at the University of Oxford, to learn how the target network can be used as a tool for theoretically breaking the deadly triad. Together, you'll explore how to theoretically understand the conventional wisdom that a target network stabilizes training, a novel target network update rule that augments the commonly used Polyak-averaging style update with two projections, and how a target network can be used in linear off-policy RL algorithms, in both prediction and control settings, as well as both discounted and average-reward Markov decision processes.

Learn more about the 2021 Microsoft Research Summit: https://Aka.ms/researchsummit




Other Videos By Microsoft Research


2022-02-08Research talk: Getting to net zero safely
2022-02-08Keynote: Building a net-zero future together
2022-02-08Opening Remarks: Research for Carbon Negative
2022-02-08Research talk: Computationally efficient large-scale AI
2022-02-08Research talk: Towards data-efficient machine learning with meta-learning
2022-02-08Research talk: Resource-efficient learning for large pretrained models
2022-02-08Research talk: Transformer efficiency: From model compression to training acceleration
2022-02-08Plenary: Industrial Research in the 21st Century
2022-02-02Dead-end Discovery: How offline reinforcement learning could assist healthcare decision-makers
2022-01-27Microsoft Soundscape - overview of Routes feature
2022-01-24Research talk: Breaking the deadly triad with a target network
2022-01-24Keynote: Accelerating the data and AI transformation of industry
2022-01-24Research talk: Revisiting data center management: Moving towards self-managed data center networks
2022-01-24Fireside chat: Network verification: Cloud first, what's next?
2022-01-24Plenary: The Future of Research
2022-01-24Opening remarks: The Future of Search and Recommendation
2022-01-24Keynote: Extreme classification for dense retrieval and personalized recommendation
2022-01-24Research talk: Making deep reinforcement learning industrially applicable
2022-01-24Closing remarks: Future of Cloud Networking
2022-01-24Talk: Project Dexter: Machine learning and automatic decision-making for robotic manipulation
2022-01-24Lightning talks: Gaming and Entertainment: Content creation at scale



Tags:
reward-based learning
reinforcement learning
innovation in artificial environments
accelerate AI
microsoft research summit