Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on May 8, 2020 1:10:29 PM ● Video Link: https://www.youtube.com/watch?v=tjbEVY5XIk0

Duration: 25:33

4,096 views

161

When AI makes a plan it usually does so step by step, forward in time. But often it is beneficial to define intermediate goals to divide a large problem into easier sub-problems. This paper proposes a generalization of MCTS that searches not for the best next actions to take, but for the best way to sub-divide the problem recursively into problems so tiny that they can each be solved in a single step.

Paper: https://arxiv.org/abs/2004.11410
Site: https://sites.google.com/view/dc-mcts/home

Abstract:
Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.

Authors: Giambattista Parascandolo, Lars Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B. Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-05-18	[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)
2020-05-17	Planning to Explore via Self-Supervised World Models (Paper Explained)
2020-05-16	[News] Facebook's Real-Time TTS system runs on CPUs only!
2020-05-15	Weight Standardization (Paper Explained)
2020-05-14	[Trash] Automated Inference on Criminality using Face Images
2020-05-13	Faster Neural Network Training with Data Echoing (Paper Explained)
2020-05-12	Group Normalization (Paper Explained)
2020-05-11	Concept Learning with Energy-Based Models (Paper Explained)
2020-05-10	[News] Google’s medical AI was super accurate in a lab. Real life was a different story.
2020-05-09	Big Transfer (BiT): General Visual Representation Learning (Paper Explained)
2020-05-08	Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)
2020-05-07	WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)
2020-05-06	Reinforcement Learning with Augmented Data (Paper Explained)
2020-05-05	TAPAS: Weakly Supervised Table Parsing via Pre-training (Paper Explained)
2020-05-04	Chip Placement with Deep Reinforcement Learning (Paper Explained)
2020-05-03	I talk to the new Facebook Blender Chatbot
2020-05-02	Jukebox: A Generative Model for Music (Paper Explained)
2020-05-01	[ML Coding Tips] Separate Computation & Plotting using locals
2020-04-30	The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies (Paper Explained)
2020-04-29	Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)
2020-04-28	[Rant] Online Conferences

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

reinforcement learning

deep rl

planning

alphago

alphazero

alpha go

alpha zero

mcts

monte carlo

tree search

subdivision

recursive

training data

hindsight experience replay

Channel	Latest
Skyprince777	8 hours ago
Tsubasa Yozora Ch.	8 hours ago
USIX Pro Gaming	8 hours ago
alanzoka	14 hours ago
AnimeToons	15 hours ago
Flik's Gaming Stuff	15 hours ago
Beyond the Brick	16 hours ago
Nintendo Life	19 hours ago
IntroGameOver	19 hours ago
Badaw Gaming	20 hours ago
lugeyps3	20 hours ago
CarbotAnimations	21 hours ago
Pixelorez	21 hours ago
Primal Koopa Pictures	21 hours ago
BeastBoyShub	22 hours ago
816	22 hours ago
Chroma	22 hours ago
Unnie Cj	23 hours ago
Brecy	23 hours ago
Renzuwu	23 hours ago
Fal Oval	23 hours ago
fadd game	23 hours ago
Aezwozere	23 hours ago
눈사람	23 hours ago
Fragilistic	23 hours ago