Efficient and Scalable Deep Learning
In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.
Talk slides: https://www.microsoft.com/en-us/research/uploads/prod/2019/11/Efficient-and-Scalable-Deep-Learning-SLIDES.pdf
See more on this talk at Microsoft Research: https://www.microsoft.com/en-us/research/video/efficient-and-scalable-deep-learning/