Structure Visual Understanding and Interaction with Human and Environment

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=wXXrFvALn5A



Duration: 1:26:14
613 views
17


The visual world around us is highly structured. As 2D projection of our world, images are also structured. In images, there are usually a background and some foreground objects (e.g., kites and birds in the sky, sheep and cows on the grass). Moreover, objects usually interact with each other in predictable ways (e.g., mugs are on tables, keyboards are below computer monitors, the sky is in the background). This structure in our world manifests itself in the visual data that captures the world around us. In this talk, I will talk about how to leverage this structure in our visual world for visual understanding and interactions with language and environment. Specifically, I will present: 1) how to learn to prune dense graph and perform relational modeling for scene graph generation; 2) how to leverage structure in images for more grounded caption generation and question generation to actively acquire more information from humans; 3) How to learn a moving strategy for embodied visual system in a 3D environments to achieve better visual perception through actions. Finally, I will briefly talk about my ongoing and future works which are aimed at connecting vision, language, and environment towards better visual understanding and interactions.

Talk slides: https://www.microsoft.com/en-us/research/uploads/prod/2019/10/Structure-Visual-Understanding-and-Interaction-with-Human-and-Environment-SLIDES.pdf

Learn more about this and other talks at Microsoft Research: https://www.microsoft.com/en-us/research/video/structure-visual-understanding-and-interaction-with-human-and-environment/




Other Videos By Microsoft Research


2019-10-21CapstanCrunch: A Haptic VR Controller with User-supplied Force Feedback
2019-10-18Social Computing for Social Good in Low-Resource Environments
2019-10-16News from the front in the post-quantum crypto wars with Dr. Craig Costello [Podcast]
2019-10-14Grounding Natural Language for Embodied Agents
2019-10-14Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare
2019-10-14Can quantum mechanics help us learn models of classical systems?
2019-10-14Reinforcement Learning From Small Data in Feature Space
2019-10-14Reward Machines: Structuring Reward Function Specifications and Reducing Sample Complexity...
2019-10-14Safe and Fair Reinforcement Learning
2019-10-14Scalable and Robust Multi-Agent Reinforcement Learning
2019-10-14Structure Visual Understanding and Interaction with Human and Environment
2019-10-14Improving Doctor-Patient Interaction with ML-Enabled Clinical Note Taking
2019-10-11HapSense: A Soft Haptic I/O Device with Uninterrupted Dual Functionalities...
2019-10-09Advanced polarized light microscopy for mapping molecular orientation
2019-10-09Data science and ML for human well-being with Jina Suh [Podcast]
2019-10-07Tea: A High-level Language and Runtime System for Automating Statistical Analysis [Python module]
2019-10-07Discover[i]: Component-based Parameterized Reasoning for Distributed Applications
2019-10-04Scheduling For Efficient Large-Scale Machine Learning Training
2019-10-03Distributed Entity Resolution for Computational Social Science
2019-10-03MMLSpark: empowering AI for Good with Mark Hamilton [Podcast]
2019-10-02Non-linear Invariants for Control-Command Systems



Tags:
visual understanding
visual data
relational modeling
scene graph generation
caption generation
question generation
3D environments
visual perception
Jianwei Yang
Microsoft Research
MSR