Visually Grounded Language Understanding and Generation

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=0EM4GfuZa1o



Duration: 1:03:12
2,177 views
41


In this talk, I will present our latest work on comprehending and generating visually grounded language. First, we will discuss the challenging task of learning visual grounding language. I will introduce how to pretrain task-agnostic visiolinguistic representations for a variety of vision and language tasks. In the second part of the talk, I will describe our recent work on image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. At the end of the talk, I will briefly discuss some ongoing work efforts on vision and language multi-task learning and generating goal driven visual dialog without dialog data.

Talk slides: https://www.microsoft.com/en-us/research/uploads/prod/2019/11/Visually-Grounded-Language-Understanding-and-Generation-SLIDES.pdf

See more on this candidate talk at Microsoft Research: https://www.microsoft.com/en-us/research/video/visually-grounded-language-understanding-and-generation/







Tags:
visually grounded language
visiolinguistic representations
image captioning
visual dialog
NLP
computer vision
visual question answering
VQA
Jiasen Lu
Microsoft Research