DocEng 2011: Document Visual Similarity Measure For Document Search

Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=KVFY-r-BLJQ



Duration: 15:02
1,758 views
7


The 11th ACM Symposium on Document Engineering
Mountain View, California, USA
September 19-22, 2011

Document Visual Similarity Measure For Document Search
Ildus Ahmadullin, Jan Allebach, Niranjan Damera-Venkata, Jian Fan, Seungyon Lee, Qian Lin, Jerry Liu, Eamonn O'Brien-Strain
Presented by Ildus Ahmadullin

ABSTRACT

Managing large document databases has become an important task. Being able to automatically compare document layouts and classify and search documents with respect to their visual appearance proves to be desirable in many applications. We propose a new algorithm that approximates a metric function between documents based on their visual similarity. The comparison is based only on the visual appearance of the document without taking into consideration its text content. We measure the similarity of single page documents with respect to distance functions between three document components: background, text, and saliency. Each document component is represented as a Gaussian mixture distribution; and distances between the components of different documents are calculated as an approximation of the Hellinger distance between corresponding distributions. Since the Hellinger distance obeys the triangle inequality, it proves to be favorable in the task of nearest neighbor search in a document database. Thus, the computation required to find similar documents in a document database can be significantly reduced.




Other Videos By Google TechTalks


2011-10-062011 Frontiers of Engineering: The Shape of Things to Come: Frontiers of Additive Manufacturing
2011-10-052011 Frontiers of Engineering: Additive Manufacturing is Changing Surgery
2011-10-052011 Frontiers of Engineering: Expanding Design Spaces
2011-10-052011 Frontiers of Engineering: Additive Manufacturing in Aerospace
2011-10-042011 Frontiers of Engineering: Challenges and Opportunities for Low-Carbon Buildings
2011-10-042011 Frontiers of Engineering: Multi-Scale Modeling of Sustainable Buildings
2011-10-042011 Frontiers of Engineering: Accelerating Green Building Market Transformation with IT
2011-10-042011 Frontiers of Engineering: Where Are the Emerging Frontiers in Research and Innovation?
2011-10-04DocEng 2011: Reflowable Documents Composed from Pre-rendered Atomic Components
2011-10-04DocEng 2011: Paginate Dynamic and Web Content
2011-10-04DocEng 2011: Document Visual Similarity Measure For Document Search
2011-10-04DocEng 2011: A Versatile Model for Web Page Representation
2011-10-03DocEng 2011: Expressing Conditions in Tailored Brochures for Public Administration
2011-10-03DocEng 2011: Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection
2011-10-03DocEng 2011: A Study of the Interaction of Paper Substrates on Printed Forensic Imaging
2011-09-302011 Frontiers of Engineering: Automatic Text Understanding of Content and Text Quality
2011-09-29DocEng 2011: Interoperable Metadata Semantics
2011-09-292011 Frontiers of Engineering: Large Scale Visual Semantic Extraction
2011-09-292011 Frontiers of Engineering: Advancing Natural Language Understanding
2011-09-282011 Frontiers of Engineering: Research at Google Lightning Talks
2011-09-28DocEng 2011: An Efficient Language-Independent Method to Extract Content from News Webpages



Tags:
google tech talk
doceng 2011
document engineering