Temporal Difference Models: Deep Model-free RL for Model-based

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=j-3nUkzMFA8



Duration: 19:50
5,236 views
56


Deep reinforcement learning (RL) has shown promising results for learning complex sequential decision-making behaviors in various environments. However, most successes have been exclusively in simulation, and results in real-world applications such as robotics are limited, largely due to poor sample efficiency of typical deep RL algorithms. I will introduce temporal difference models (TDMs), an extension of goal-conditioned value functions that enables multi time resolution model-base planning. TDMs generalize traditional predictive models, bridge the gap between model-based and off-policy model-free RL, and show substantial improvements in sample efficiency without introducing asymptotic performance loss.

See more at https://www.microsoft.com/en-us/research/video/temporal-difference-models-deep-model-free-rl-model-based/







Tags:
microsoft research