Visual Understanding in Natural Language

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=LAWeOZdvRvE



Duration: 1:20:30
1,329 views
14


Bridging visual and natural language understanding is a fundamental requirement for intelligent agents. This talk will focus mainly on automatic image captioning and visual question answering (VQA). I will cover some recent advances in automatic image caption evaluation, visual attention modeling and generalization to images 'in the wild'. I will also introduce my recent work on vision-and-language navigation (VLN), in which we situate agents in a new RL environment constructed from dense RGB-D imagery of 90 real buildings.

See more at https://www.microsoft.com/en-us/research/video/visual-understanding-in-natural-language/







Tags:
microsoft research
visual understanding
natural language
intelligent agents
automatic image captioning
visual question answering
VQA
VLN