Resource-Efficient Redundancy for Large-Scale Data Processing and Storage Systems

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=Vkr2C9Fkba4



Duration: 1:25:33
1,162 views
21


Large-scale systems are often subject to non-ideal conditions such as failures, stragglers, load imbalance, etc. These issues adversely affect query latency in data-processing systems, and durability and access latency in storage systems. Redundancy (duplication of data and/or queries) is a common approach employed to impart resilience against such adverse effects. In this talk, I will present two sets of results that take fundamentally new approaches to adding redundancy in data processing and storage systems, blending tools from coding theory and machine learning along with systems insights:

(1) A novel learning-and-coding-based resilient computation framework and its application to reducing tail latency in serving neural network models for a variety of tasks such as image classification, speech recognition, and object detection. Our solution is the first to overcome a challenging barrier that limited the applicability of existing coding-based resilient computation approaches to a severely limited class of functions.

(2) A new redundancy-configuration approach for large-scale storage systems that exploits reliability heterogeneity in storage devices to achieve significant cost savings. Our solution contests the widely used static approach to configuring redundancy by proposing a dynamic data-driven approach that tailors redundancy levels to observed failure rates. Using a production data set, we show 11-16% reduction in storage space even in highly-optimized erasure-coded storage systems, translating to significant cost savings in large-scale operations.

Talk slides: https://www.microsoft.com/en-us/research/uploads/prod/2019/09/Resource-Efficient-Redundancy-for-Large-Scale-Data-Processing-and-Storage-Systems-SLIDES.pdf

Learn more about this and other talks at Microsoft Research: https://www.microsoft.com/en-us/research/video/resource-efficient-redundancy-for-large-scale-data-processing-and-storage-systems/







Tags:
Large-scale system
data processing
data storage
storage systems
data-processing systems
data redundancy
computation framework
neural network models
image classification
speech recognition
object detection
storage devices
Microsoft Research
MSR
Rashmi Vinayak