Learnable Similarity Functions and Their Applications in Information Integration and Clustering

Channel:

Microsoft Research

Subscribers:

351,000

Published on September 6, 2016 6:22:09 AM ● Video Link: https://www.youtube.com/watch?v=vv765VQ2U80

Category:

Guide

Duration: 1:25:08

353 views

Pairwise similarity functions are ubiquitous in data mining and machine learning algorithms. Record linkage, clustering, nearest-neighbor search, information retrieval - these are all tasks where pairwise distance computations play a central role. Accuracy in these tasks depends critically on how well the similarity function captures the notion of likeness between objects in a given domain. Therefore, it is desirable to employ similarity functions that can adapt to the domain and task at hand. We demonstrate the benefits of using learnable similarity functions on two tasks: record linkage and clustering. The goal of record linkage (also known as de-duplication and identity uncertainty) is to identify different database records that describe the same underlying entity. We introduce several learnable string distance functions based on probabilistic models, as well as an adaptive framework for combining them, both of which lead to significant accuracy improvements. The other task we consider is semi-supervised clustering, where we present a probabilistic clustering framework based on Hidden Markov Random Fields that incorporates learnable similarity functions. Finally, we describe how learning similarity functions allows efficient scaling of record linkage and clustering methods to large datasets.

Other Videos By Microsoft Research

2016-09-06	Enabling Internet Malware Investigation and Defense Using Virtualization
2016-09-06	Cohomology in Grothendieck Topologies and Lower Bounds in Boolean Complexity
2016-09-06	Approximate inference techniques for optimal design in self-assembly and automated programming
2016-09-06	Machine Learning Methods for Structured and Collective Classification
2016-09-06	Communication Technology: Interruption and Overload
2016-09-06	ParaEval: Using Paraphrases to Improve Machine Translation and Summarization Evaluations
2016-09-06	Rethinking Processor and System Architecture
2016-09-06	Crashing the Gate: Netroots, Grassroots, and the Rise of People-Powered Politics
2016-09-06	Improving Routing Scalability through Mobile Geographic Hashing in MANETs
2016-09-06	The Semantic Web: Myth and Reality
2016-09-06	Learnable Similarity Functions and Their Applications in Information Integration and Clustering
2016-09-06	Process Extraction in an Abstract Logic of Events [1/2]
2016-09-06	Billions: Selling to the New Chinese Consumer
2016-09-06	Conditional Models for Combining Diverse Knowledge Sources in Information Retrieval
2016-09-06	Scalable Automated Methods for Software Reliability
2016-09-06	Naked Conversations: How blogs are changing the Way businesses Talk with Customers
2016-09-06	Natural Scene Categorization in Humans and Computers
2016-09-06	Pair Programming Re-Design
2016-09-06	An Interface to Support Multi-faceted Information Seeking and Targeted Relevance Feedback
2016-09-06	Dependable Messaging in Sensor Networks
2016-09-06	Discriminative Graphical Models for Structured Data Prediction

Tags:

microsoft research

Channel	Latest
Simple Gamer	6 hours ago
RedCaio	6 hours ago
A TUTTO CALCIO⚽	6 hours ago
Zaxx Gaming	6 hours ago
LEO DESANDE E ANA CLÁUDIA	6 hours ago
Starzkil1z	6 hours ago
rickX lods official	6 hours ago
WraggyTheGamer	6 hours ago
Böröcz "DeadFox" Bence	6 hours ago
Joey Fernandez	6 hours ago
Drachinifel	6 hours ago
UmmeBlox	6 hours ago
Hutton	6 hours ago
CANAL DO MARCIO 🎮🕹	6 hours ago
なすななし	6 hours ago
COSEF NASTYA	6 hours ago
จุ่มค่ะ มากับนุ่นแล้วก็มากับโบว์	6 hours ago
ADIT DIAMOND	6 hours ago
D R P O O - FF	6 hours ago
Ini Guru Budi	6 hours ago
HaDDGamer YT	6 hours ago
Gamer of Andhra	6 hours ago
WBG LEADER	6 hours ago
ちょぶり【eFootball解説】	6 hours ago
AB Sujeet	6 hours ago