End-to-end Reinforcement Learning for the Large-scale Traveling Salesman Problem

Subscribers:
343,000
Published on ● Video Link: https://www.youtube.com/watch?v=qaOp1iRL-14



Duration: 30:07
643 views
14


2022 Data-driven Optimization Workshop: End-to-end Reinforcement Learning for the Large-scale Traveling Salesman Problem

Speaker: Yan Jin, Huazhong University of Science and Technology

Traveling Salesman Problem (TSP) is one of the most studied routing problems that arise in the practical applications of logistics. Traditional approaches not only rely on hand-crafted rules of experts, but also are time-consuming on iterative search. This limits their applications in time sensitive scenarios, e.g., on-call routing and ride hailing service. We propose an end-to-end approach based on hierarchical reinforcement learning for addressing the large-scale TSP. Using a divide-and-conquer strategy, the upper-level policy chooses a small subset of cities from all remaining cities that are to be traversed, while the lower-level policy takes a Transformer model on the chosen cities to solve a shortest path with prescribed starting and ending cities. These two policies are jointly trained by reinforcement learning algorithms, and the TSP solutions can be directly generated without any search procedure. The proposed approach takes advantage of inference efficiency of Transformer model and provides highly competitive results.




Other Videos By Microsoft Research


2023-01-24SmartKC: A Low-cost, Smartphone-based Corneal Topographer
2023-01-11MSR-IISc AI Seminar Series: On Learning-Aware Mechanism Design - Michael I. Jordan
2022-12-22Tongue-Gesture Recognition in Head-Mounted Displays
2022-12-15Global Renewables Watch - AI for Good Lab - Geospatial
2022-12-15Toward a Healthy Research Ecosystem for Large Language Models | Panel Discussion
2022-12-14Joint Pricing and Inventory Management with Demand Learning
2022-12-14SITI 2022 - Panel Discussion and moderated Q&A session
2022-12-12Machine Learning for Combinatorial Optimization: Some Empirical Studies
2022-12-12Online Facility Location with Predictions
2022-12-12Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
2022-12-12End-to-end Reinforcement Learning for the Large-scale Traveling Salesman Problem
2022-12-06Personality Predictions from Automated Video Interviews: Explainable or Unexplainable Models?
2022-12-06Responsible AI: An Interdisciplinary Approach | Panel Discussion
2022-12-06Personalizing Responsibility within AI Systems: A Case for Designing Diversity
2022-12-06Evidence-based Evaluation for Responsible AI
2022-12-06Towards Trustworthy Recommender Systems: From Shallow Models to Deep Models to Large Models
2022-12-06Development of a Game-Based Assessment to Measure Creativity
2022-12-06Interpretability, Responsibility and Controllability of Human Behaviors
2022-12-06On the Adversarial Robustness of Deep Learning
2022-12-06The Long March Towards AI Fairness
2022-12-06Towards Human Value Based Natural Language Processing (NLP)