Scheduling For Efficient Large-Scale Machine Learning Training

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=_rAkFBE-ItE



Duration: 1:12:33
1,441 views
31


Over recent years, machine learning techniques have achieved success in many real-world applications. While researchers and practitioners continue to expand machine learning to new application domains and push the boundary of existing applications, they face critical computational challenges due to growing dataset size, increasing model complexity and capacity. These challenges demand new software systems to train large models efficiently and to enable machine learning researchers to easily experiment with new ideas.

There exist many opportunities to improve training time and support training larger models by leveraging the structural properties of machine learning computation to design efficient training systems. In this talk, I will present two distributed training systems Bösen and Orion that schedules inter-machine network communication and parallel computation to improve training time by reducing data inconsistency in parameter states, without requiring heavy programmer effort. Moreover, by scheduling memory usage in TensorFlow, we reduce GPU memory consumption by 87% and enable training models with 10x more parameters on the same hardware.

See more at https://www.microsoft.com/en-us/research/video/scheduling-for-efficient-large-scale-machine-learning-training/




Other Videos By Microsoft Research


2019-10-14Reward Machines: Structuring Reward Function Specifications and Reducing Sample Complexity...
2019-10-14Safe and Fair Reinforcement Learning
2019-10-14Scalable and Robust Multi-Agent Reinforcement Learning
2019-10-14Structure Visual Understanding and Interaction with Human and Environment
2019-10-14Improving Doctor-Patient Interaction with ML-Enabled Clinical Note Taking
2019-10-11HapSense: A Soft Haptic I/O Device with Uninterrupted Dual Functionalities...
2019-10-09Advanced polarized light microscopy for mapping molecular orientation
2019-10-09Data science and ML for human well-being with Jina Suh [Podcast]
2019-10-07Tea: A High-level Language and Runtime System for Automating Statistical Analysis [Python module]
2019-10-07Discover[i]: Component-based Parameterized Reasoning for Distributed Applications
2019-10-04Scheduling For Efficient Large-Scale Machine Learning Training
2019-10-03Distributed Entity Resolution for Computational Social Science
2019-10-03MMLSpark: empowering AI for Good with Mark Hamilton [Podcast]
2019-10-02Non-linear Invariants for Control-Command Systems
2019-10-02Vision-and-Dialog Navigation
2019-10-01The Future of Mathematics?
2019-09-30How Not to Prove Your Election Outcome
2019-09-30The Worst Form Including All Those Others: Canada’s Experiments with Online Voting
2019-09-30DIFF: A Relational Interface for Large-Scale Data Explanation
2019-09-30A Calculus for Brain Computation
2019-09-26Decoding Multisensory Attention from Electroencephalography for Use in a Brain-Computer Interface



Tags:
machine learning
AI
machine learning training
Bösen
Orion
distributed training systems
TensorFlow
microsoft research