Disk Failure: How It Happens And What To Do About It

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=uG2byZbAt2k



Duration: 1:08:11
60 views
0


Disk drive failures continue to be one of the primary causes of data loss and system failures. In most failure scenarios, the disk does not stop working entirely; rather, the failures tend to be partial failures, where some disk sectors are unavailable due to a latent sector error or block corruption. In this talk, I will present the characteristics, storage-stack impact, and tolerance techniques for such partial failures. First, I will present our large-scale studies of two important kinds of partial disk failures -- latent sector errors and data corruption. Our studies show that these failures do occur and a significant percentage of disk drives suffer from them. The studies also identify interesting failure characteristics such as non-independence, spatial locality, and temporal locality, all which greatly impact techniques used to protect against these failures. Second, I will briefly discuss our analyses of how these failures affect various components of the storage stack, including file systems, virtual memory systems, and RAID storage systems. Our analyses show that the storage stack components are ineffective, and often inconsistent in dealing with partial disk failures. Third, I will present a file system architecture that I am building to tolerate such failures. The architecture, based on N-version programming principles, uses multiple different file systems to store and retrieve data in order to provide robust and efficient file service.




Other Videos By Microsoft Research


2016-09-06Virtual Earth Summit - Welcome - Overview of the Summit, One Minute Introductions
2016-09-06Machine Understanding of Human Audio/ visual Affective Expressions
2016-09-06Enriching Speech Translation: Exploiting Information Beyond Words
2016-09-06Hardware-Software Co-Design for General-Purpose Processors [1/14]
2016-09-06Interaction Design Based on Human Capabilities for Contemporary and Emerging Technologies
2016-09-06Developing, Optimizing and Hosting Data Driven Web Applications
2016-09-06P2P and Online Social Networking Research at Mirage Group
2016-09-06A Compositional Method for Verifying Software Transactional Memory
2016-09-06Semantic Components: A Model for Enhancing Retrieval of Domain-Specific Information
2016-09-06Demystifying Internet Traffic
2016-09-06Disk Failure: How It Happens And What To Do About It
2016-09-06A Constraint Solver: Finding Models and Cores of Large Relational Specifications
2016-09-06Software & Architectural Techniques for Cache Leakage Reduction in Nanometer-scale Embedded Systems
2016-09-06Data-driven methods in Description-based Audio Information Processing
2016-09-06Single Image Dehazing
2016-09-06EE Talk - How to Make Things Happen: Mastering Project Management
2016-09-06XNA Game Studio Workshop - Session Two
2016-09-06Path Projection for User-Centered Static Analysis Tools
2016-09-06Small Loans, Big Dreams: How Nobel Prize Winner Muhammad Yunus & Microfinance are Changing the World
2016-09-06e-Heritage Project
2016-09-06Reducing the Risk of Pragmatic Reuse Tasks



Tags:
microsoft research