Visual Understanding in Natural Language
Channel:
Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=LAWeOZdvRvE
Bridging visual and natural language understanding is a fundamental requirement for intelligent agents. This talk will focus mainly on automatic image captioning and visual question answering (VQA). I will cover some recent advances in automatic image caption evaluation, visual attention modeling and generalization to images 'in the wild'. I will also introduce my recent work on vision-and-language navigation (VLN), in which we situate agents in a new RL environment constructed from dense RGB-D imagery of 90 real buildings.
See more at https://www.microsoft.com/en-us/research/video/visual-understanding-in-natural-language/
Other Videos By Microsoft Research
Tags:
microsoft research
visual understanding
natural language
intelligent agents
automatic image captioning
visual question answering
VQA
VLN