Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=AxE7qGKJWaw



Duration: 2:34
3,926 views
62


Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. In this work we propose a novel method for learning robust visuomotor policies for real-world deployment which can be trained purely with simulated data. We develop rich state representations that combine supervised and unsupervised environment data. Our approach takes a cross-modal perspective, where separate modalities correspond to the raw camera data and the system states relevant to the task, such as the relative pose of gates to the drone in the case of drone racing. We feed both data modalities into a novel factored architecture, which learns a joint low-dimensional embedding via Variational Auto Encoders. This compact representation is then fed into a control policy, which we trained using imitation learning with expert trajectories in a simulator. We analyze the rich latent spaces learned with our proposed representations, and show that the use of our cross-modal architecture significantly improves control policy performance as compared to end-to-end learning or purely unsupervised feature extractors. We also present real-world results for drone navigation through gates in different track configurations and environmental conditions. Our proposed method, which runs fully onboard, can successfully generalize the learned representations and policies across simulation and reality, significantly outperforming baseline approaches.

Authors: Rogerio Bonatti, Ratnesh Madaan, Vibhav Vineet, Sebastian Scherer, Ashish Kapoor

See the publication: https://www.microsoft.com/en-us/research/publication/learning-visuomotor-policies-for-aerial-navigation-using-cross-modal-representations/

Air Lab, Carnegie Mellon University:
http://theairlab.org/




Other Videos By Microsoft Research


2020-04-01An interview with Microsoft President Brad Smith | Podcast
2020-03-30Microsoft Rocketbox Avatar library
2020-03-27Virtual reality without vision: A haptic and auditory white cane to navigate complex virtual worlds
2020-03-26Statistical Frameworks for Mapping 3D Shape Variation onto Genotypic and Phenotypic Variation
2020-03-26Can Machines Perceive Emotion?
2020-03-25Microsoft’s AI Transformation, Project Turing and smarter search with Rangan Majumder | Podcast
2020-03-19Enabling Rural Communities to Participate in Crowdsourcing, with Dr. Vivek Seshadri | Podcast
2020-03-19Demo: Enhancing Smartphone Productivity and Reliability with an Integrated Display Cover
2020-03-19Demo: A Versatile Controller Concept for Mobile Gaming
2020-03-18Auto ML and the future of self-managing networks with Dr. Behnaz Arzani | Podcast
2020-03-17Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations
2020-03-17Inside look at the DARPA Subterranean Urban Circuit Challenge 2020
2020-03-16Are you Exploiting Your Assumptions? Towards Effective Priors for Biomarker Discovery and...
2020-03-12Fireside Chat with Aaron Courville
2020-03-11Engineering research to life with Gavin Jancke | Podcast
2020-03-10ARS 2020: Fairness, Transparency and Privacy in Digital Systems
2020-03-10ARS 2020: Panel Discussion: Cyber Security – National Policy and Preparedness
2020-03-10ARS 2020: Privacy & Security in Healthcare
2020-03-10ARS 2020: Security & Privacy around Foundational Identity for Sovereign Nations
2020-03-10ARS 2020: Track on Verification
2020-03-05Potential and Pitfalls of AI with Dr. Eric Horvitz | Podcast