Learnable Similarity Functions and Their Applications in Information Integration and Clustering

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=vv765VQ2U80



Category:
Guide
Duration: 1:25:08
353 views
4


Pairwise similarity functions are ubiquitous in data mining and machine learning algorithms. Record linkage, clustering, nearest-neighbor search, information retrieval - these are all tasks where pairwise distance computations play a central role. Accuracy in these tasks depends critically on how well the similarity function captures the notion of likeness between objects in a given domain. Therefore, it is desirable to employ similarity functions that can adapt to the domain and task at hand. We demonstrate the benefits of using learnable similarity functions on two tasks: record linkage and clustering. The goal of record linkage (also known as de-duplication and identity uncertainty) is to identify different database records that describe the same underlying entity. We introduce several learnable string distance functions based on probabilistic models, as well as an adaptive framework for combining them, both of which lead to significant accuracy improvements. The other task we consider is semi-supervised clustering, where we present a probabilistic clustering framework based on Hidden Markov Random Fields that incorporates learnable similarity functions. Finally, we describe how learning similarity functions allows efficient scaling of record linkage and clustering methods to large datasets.




Other Videos By Microsoft Research


2016-09-06Enabling Internet Malware Investigation and Defense Using Virtualization
2016-09-06Cohomology in Grothendieck Topologies and Lower Bounds in Boolean Complexity
2016-09-06Approximate inference techniques for optimal design in self-assembly and automated programming
2016-09-06Machine Learning Methods for Structured and Collective Classification
2016-09-06Communication Technology: Interruption and Overload
2016-09-06ParaEval: Using Paraphrases to Improve Machine Translation and Summarization Evaluations
2016-09-06Rethinking Processor and System Architecture
2016-09-06Crashing the Gate: Netroots, Grassroots, and the Rise of People-Powered Politics
2016-09-06Improving Routing Scalability through Mobile Geographic Hashing in MANETs
2016-09-06The Semantic Web: Myth and Reality
2016-09-06Learnable Similarity Functions and Their Applications in Information Integration and Clustering
2016-09-06Process Extraction in an Abstract Logic of Events [1/2]
2016-09-06Billions: Selling to the New Chinese Consumer
2016-09-06Conditional Models for Combining Diverse Knowledge Sources in Information Retrieval
2016-09-06Scalable Automated Methods for Software Reliability
2016-09-06Naked Conversations: How blogs are changing the Way businesses Talk with Customers
2016-09-06Natural Scene Categorization in Humans and Computers
2016-09-06Pair Programming Re-Design
2016-09-06An Interface to Support Multi-faceted Information Seeking and Targeted Relevance Feedback
2016-09-06Dependable Messaging in Sensor Networks
2016-09-06Discriminative Graphical Models for Structured Data Prediction



Tags:
microsoft research