Coping with Uncertain Data: Multi-Source Integration and Fuzzy Lookups

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=8TIaSVZm9R0



Duration: 1:24:14
274 views
4


This talk presents two separate pieces of work that share a common challenge: dealing with uncertainty in data. In the first part of the talk, we address the problem of integrating multiple sources of uncertain data. As an extremely simple motivating example, one image database may label an image as blue or green, while another source labels the same image as green or yellow. As a result of combining information from the two sources, green may be deemed more likely than the other two colors. We will discuss how integration of uncertain data, through both contradiction and corroboration, can yield a more certain result than any of the sources individually. Specifically, we tackle the local-as-view setting of data integration where each source database may be an uncertain database. Our contributions include a new containment definition for uncertain databases, efficient representation and query answering techniques, and coping with inconsistent sources. In the second part, we consider the problem of fuzzy lookups. Keyword search, data cleaning, and entity resolution all rely on efficient fuzzy lookups based on textual similarity functions. We introduce a notion of transformations into the lookup problem, enabling users to specify rules such as 'Robert - Bob' and 'Robert - Bert' that are incorporated into the matching process. We then motivate a similarity function called Jaccard containment, which is an error-tolerant version of set containment. Finally, we present algorithms that enable efficient Jaccard containment lookups in the presence of the uncertainty introduced by transformation rules.




Other Videos By Microsoft Research


2016-08-163D Reconstruction meets GPGPU meets Image Analysis
2016-08-16Percolation on Self-Dual Polygon Configurations
2016-08-16Automated Traceability Techniques for Software Engineering and e-Science
2016-08-16Digital Archeology of Software
2016-08-16Executable Knowledge for Molecular Systems Biology
2016-08-16WISPs, Computational RFID and the Internet of Things
2016-08-16High-level Languages for Low-level Systems
2016-08-16Decision Making under Uncertainty
2016-08-16Recognizing a Million Voices: Low Dimensional Audio Representations for Speaker Identification
2016-08-16Distributed Implementations of Component-based Systems Using Source-to-source Transformations in BIP
2016-08-16Coping with Uncertain Data: Multi-Source Integration and Fuzzy Lookups
2016-08-16Providing Richer Descriptions for Images
2016-08-16Building and Evaluating Creative Interaction
2016-08-16Enforcing topological constraints in energy-based image segmentation
2016-08-16Probabilistic Approximation Theorems in Game Theory; The Theory of Crowdsourcing
2016-08-16Longitudinal Evaluation of API Usability and Designing Support for Collaborative Search
2016-08-16On a first-order primal-dual algorithm with applications to convex problems in computer vision
2016-08-16Two Vignettes in Computational Finance
2016-08-16MSR Overview: Introduction & Logistics, Overview, The 4th Paradigm; Tech Surveys
2016-08-16Inductive Synthesis of Recursive Functional Programs
2016-08-16Precise Identification of Problems for Structural Test Generation



Tags:
microsoft research