Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)

Subscribers:
284,000
Published on ● Video Link: https://www.youtube.com/watch?v=tjbEVY5XIk0



Duration: 25:33
4,096 views
161


When AI makes a plan it usually does so step by step, forward in time. But often it is beneficial to define intermediate goals to divide a large problem into easier sub-problems. This paper proposes a generalization of MCTS that searches not for the best next actions to take, but for the best way to sub-divide the problem recursively into problems so tiny that they can each be solved in a single step.

Paper: https://arxiv.org/abs/2004.11410
Site: https://sites.google.com/view/dc-mcts/home

Abstract:
Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.

Authors: Giambattista Parascandolo, Lars Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B. Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher




Other Videos By Yannic Kilcher


2020-05-18[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)
2020-05-17Planning to Explore via Self-Supervised World Models (Paper Explained)
2020-05-16[News] Facebook's Real-Time TTS system runs on CPUs only!
2020-05-15Weight Standardization (Paper Explained)
2020-05-14[Trash] Automated Inference on Criminality using Face Images
2020-05-13Faster Neural Network Training with Data Echoing (Paper Explained)
2020-05-12Group Normalization (Paper Explained)
2020-05-11Concept Learning with Energy-Based Models (Paper Explained)
2020-05-10[News] Google’s medical AI was super accurate in a lab. Real life was a different story.
2020-05-09Big Transfer (BiT): General Visual Representation Learning (Paper Explained)
2020-05-08Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)
2020-05-07WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)
2020-05-06Reinforcement Learning with Augmented Data (Paper Explained)
2020-05-05TAPAS: Weakly Supervised Table Parsing via Pre-training (Paper Explained)
2020-05-04Chip Placement with Deep Reinforcement Learning (Paper Explained)
2020-05-03I talk to the new Facebook Blender Chatbot
2020-05-02Jukebox: A Generative Model for Music (Paper Explained)
2020-05-01[ML Coding Tips] Separate Computation & Plotting using locals
2020-04-30The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies (Paper Explained)
2020-04-29Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)
2020-04-28[Rant] Online Conferences



Tags:
deep learning
machine learning
arxiv
explained
neural networks
ai
artificial intelligence
paper
rl
reinforcement learning
deep rl
planning
alphago
alphazero
alpha go
alpha zero
mcts
monte carlo
tree search
subdivision
recursive
training data
hindsight experience replay