Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=jOEhYByjx10



Duration: 24:28
1,146 views
23


Probabilistic Q-learning is a promising approach balancing exploration and exploitation in reinforcement learning.
However, existing implementations have significant limitations: they either fail to incorporate uncertainty about long-term consequences of actions or ignore fundamental dependencies in state-action values implied by the~Bellman equation. These problems result in sub-optimal exploration. As a solution, we develop Successor Uncertainties (SU), a probabilistic Q-learning method free of the aforementioned problems. SU outperforms existing baselines on tabular problems and on the Atari benchmark benchmark suite. Overall, SU is an improved and scalable probabilistic Q-learning method with better properties than its predecessors at no extra cost.

See more at https://www.microsoft.com/en-us/research/video/successor-uncertainties-exploration-and-uncertainty-in-temporal-difference-learning/







Tags:
microsoft research