A Machine Learning Perspective on Managing Noisy Structured Data

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=PQV5qtS1E1U



Duration: 1:00:28
1,987 views
48


Modern analytics depend on high-effort tasks like data preparation and data cleaning to produce accurate results. This talk describes recent work on making routine data preparation tasks such as data cleaning dramatically easier. I will first introduce a formal probabilistic framework to describe the quality of structured data and demonstrate how this framework allows us to cast data cleaning as a statistical learning and inference problem. I will then show how this connection allows us to obtain formal guarantees on automated data cleaning and describe how it forms the basis of the HoloClean framework, a state-of-the-art ML-based solution for managing noisy structured data. I will close with additional examples of how a statistical learning view on managing noisy data can lead to new solutions to classical database problems such as the discovery of functional dependencies in structured data.

See more at https://www.microsoft.com/en-us/research/video/a-machine-learning-perspective-on-managing-noisy-structured-data/







Tags:
AI
data platforms and analytics
data cleaning
probabilistic framework
structured data
HoloClean framework
machine learning
microsoft research
Theodoros Rekatsinas