Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for VLN

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=lqeeoqlaDiw



Duration: 1:21
1,182 views
19


Vision-Language Navigation is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. We propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL) and further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions.







Tags:
microsoft research