Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=bKD2ywAFNqk



Duration: 47:53
2,736 views
73


Machine learning research has traditionally been model-centric, focusing on architectures, parameter optimization,  and model transfer. Much less attention has been given to the datasets on which these models are trained, which are often assumed to be fixed, or subject to extrinsic and inevitable change. However, successful application of ML in practice often requires substantial effort in terms of dataset preprocessing and manipulation, such as augmenting, merging, mixing, or reducing datasets.

In this talk I will present some of our recent work that seeks to formalize and automatize these and other flavors of dataset manipulation under a unified approach. First, I will introduce the Optimal Transport Dataset Distance, which provides a fundamental theoretical building block: a formal notion of similarity between labeled datasets. In the second part of the talk, I will discuss how this notion of distance can be used to formulate a general framework of dataset optimization by means of gradient flows in probability space. I will end by presenting various exciting potential applications of this dataset optimization framework.

Learn more about the 2020-2021 Directions in ML: AutoML and Automating Algorithms virtual speaker series: https://aka.ms/diml




Other Videos By Microsoft Research


2020-12-09Accessible CS Education Fall Workshop: Microsoft Chief Accessibility Officer Jenny Lay-Flurrie
2020-12-09Students with disabilities in the U.S.
2020-12-09Welcome & Introduction to Microsoft's Accessible Computer Science Education Fall Workshop
2020-12-08De-Identifying Healthcare Data for Research
2020-12-05Task-Oriented Dialogue as Dataflow Synthesis
2020-12-03The opportunities with AI and machine learning
2020-12-02Demonstration of Lumiere (1995)
2020-12-02Demonstration of Priorities & Notification Platform (2001)
2020-12-01Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
2020-12-01Geometry-constrained Beamforming Network for end-to-end Farfield Sound Source Separation
2020-11-24Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport
2020-11-13Audio-based Toxic Language Detection
2020-11-05CDO roundtable: Generating business value through data quality
2020-11-04Unlocking IoT Data for Research in Healthcare
2020-11-03MSR Twitter Local Events
2020-11-02Spotlight on advancements in AI, HCI, Computing, VR, Systems Networking & more at Microsoft Research
2020-10-30Distinct population of sudden unexpected infant death based on age
2020-10-28Enabling interaction between mixed reality and robots via cloud-based localization
2020-10-26Directions in ML: AutoML & Interpretability: Powering the machine learning revolution in healthcare
2020-10-23Evaluating and validating research that aspires to societal impact in real world scenarios with Tanu
2020-10-23A Closed-loop Adaptive Brain-computer Interface Framework



Tags:
AutoML
Automating Algorithms
Dataset Comparison
machine learning
David Alvarez-Melis
Microsoft Research
webinar
Optimal Transport
dataset manipulation
dataset optimization