Distributed Entity Resolution for Computational Social Science

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=W7Xqt4guibc



Duration: 1:00:27
1,778 views
55


Very often information about social entities is scattered across multiple databases. Combining that information into one database can result in enormous benefits for analysis, resulting in richer and more reliable conclusions. In most practical applications, however, analysts cannot simply link records across databases based on unique identifiers, such as social security numbers, either because they are not a part of some databases or are not available due to privacy concerns. In such cases, analysts need to use methods from statistical and computational science known as entity resolution (record linkage or de-duplication) to proceed with analysis. Entity resolution is not only a crucial task for social science and industrial applications, but is a challenging statistical and computational problem itself. In this talk, we describe the past and present challenges with entity resolution. More specifically, I will discuss unsupervised Bayesian entity resolution models, which are able to identify duplicate records in the data, while quantifying uncertainty of the entity resolution process. In addition, one can prove tight theoretical bounds on the class of entity resolution models, which support the proposed approach. Finally, I present a distributed extension of this work, where we can scale into the millions of records, while crucially incorporating partitions. I will provide results on three real data sets to the computational social sciences, and in progress work in the field of human rights on El Salvador.

See more at https://www.microsoft.com/en-us/research/video/distributed-entity-resolution-for-computational-social-science/




Other Videos By Microsoft Research


2019-10-14Safe and Fair Reinforcement Learning
2019-10-14Scalable and Robust Multi-Agent Reinforcement Learning
2019-10-14Structure Visual Understanding and Interaction with Human and Environment
2019-10-14Improving Doctor-Patient Interaction with ML-Enabled Clinical Note Taking
2019-10-11HapSense: A Soft Haptic I/O Device with Uninterrupted Dual Functionalities...
2019-10-09Advanced polarized light microscopy for mapping molecular orientation
2019-10-09Data science and ML for human well-being with Jina Suh [Podcast]
2019-10-07Tea: A High-level Language and Runtime System for Automating Statistical Analysis [Python module]
2019-10-07Discover[i]: Component-based Parameterized Reasoning for Distributed Applications
2019-10-04Scheduling For Efficient Large-Scale Machine Learning Training
2019-10-03Distributed Entity Resolution for Computational Social Science
2019-10-03MMLSpark: empowering AI for Good with Mark Hamilton [Podcast]
2019-10-02Non-linear Invariants for Control-Command Systems
2019-10-02Vision-and-Dialog Navigation
2019-10-01The Future of Mathematics?
2019-09-30How Not to Prove Your Election Outcome
2019-09-30The Worst Form Including All Those Others: Canada’s Experiments with Online Voting
2019-09-30DIFF: A Relational Interface for Large-Scale Data Explanation
2019-09-30A Calculus for Brain Computation
2019-09-26Decoding Multisensory Attention from Electroencephalography for Use in a Brain-Computer Interface
2019-09-26A Short Introduction to DIMACS & DIMACS and MSR-NYC



Tags:
social sciences
social entities
entity resolution
Bayesian
computational social science
microsoft research