Recent Advances in Image Captioning, Image-Text Retrieval andโ€ฆ

Recent Advances in Image Captioning, Image-Text Retrieval andโ€ฆ

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=4wS02nCWXvw



Duration: 1:15:34
3,982 views
90


๐—ง๐—ถ๐˜๐—น๐—ฒ: Recent Advances in Image Captioning, Image-Text Retrieval and Visual Question Answering using Scene Graph Parsing, What Next?
๐—ฆ๐—ฝ๐—ฒ๐—ฎ๐—ธ๐—ฒ๐—ฟ: Hamid Palangi
๐——๐—ฎ๐˜๐—ฒ: July 9, 2019

Creating appropriate representation of data is the key for many recent breakthroughs in both language and vision. In natural language, from the structured representations like parse trees to BERT and Transformers pretrained on large scale data. In computer vision, with a slightly different trend, from scale-invariant feature transform (SIFT) to CNNs pretrained on large scale data back to more structured representation of images using scene graphs. Building appropriate models for parsing scene into graphs has unique challenges which has led to the task of Scene Graph Generation (SGG) with various subtasks from object detection, to scene graph classification and detection. An orthogonal challenge to SGG task is the effectiveness of generated scene graphs in downstream language and vision tasks that can benefit from these pretrained models. In this talk, we present our recent work to pretrain large scale SGGs, and two new models to exploit them which has resulted in significant improvement for downstream tasks of image captioning and image-text retrieval. We further present the challenges and opportunities ahead for SGGs and new downstream tasks like visual question answering.

๐—ฆ๐—น๐—ถ๐—ฑ๐—ฒ๐˜€: https://www.microsoft.com/en-us/research/uploads/prod/2021/07/43842.pdf




Other Videos By Microsoft Research


2021-09-15Learning from Unlabeled Videos for Recognition, Prediction, and Control
2021-09-15Grounded Visual Generation
2021-08-25The New Jim Code: Reimagining the Default Settings of Technology & Society
2021-08-19A mechatronic shape display based on auxetic materials
2021-08-16Dependable IoT- Making data from IoT devices dependable and trustworthy for good decision making
2021-08-11Lookout System: National Television Commercial (1998)
2021-08-06Create human-centered AI with the Human-AI eXperience (HAX) Toolkit webinar
2021-08-04Computing Technology as Racial Infrastructure: A History of the Present & Blueprint for Black Future
2021-07-27Urban Air Chicago
2021-07-09The Vanishing Indian Speaks Back: Race, Genomics, and Indigenous Rights
2021-07-08Recent Advances in Image Captioning, Image-Text Retrieval andโ€ฆ
2021-07-08Directions in ML: Structured Models for Automated Machine Learning
2021-06-28Introducing Retiarii: A deep learning exploratory-training framework on NNI
2021-06-28Tackling the unknowns: leverage computing in biomedical sciences and public health
2021-06-28Computational Ecology & Environmentology in Microsoft Research Asia
2021-06-25Microsoft Soundscape: Heads-up and hands-free experiences for everyone
2021-06-23SMPL in Mixed Reality at Microsoft
2021-06-22Subnanosecond Clock & Data Recovery for Optically Switched Data Centres
2021-06-22Synthetic Data with Digital Humans
2021-06-16Keynote: Computer Vision for Social Presence in Mixed Reality | JRC Workshop 2021
2021-06-16Video tutorial demonstrating SOLOIST: building task bots at scale



Tags:
Image Captioning
Image-Text Retrieval
Visual Question Answering
Scene Graph Parsing