Automatic Failure Diagnosis in Large-Scale Systems
Channel:
Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=djBNUufl0Qc
As modern computer systems grow in both size and complexity, so has the need for automatic analysis and computer-aided administration of these systems. With recent booms in computing power and efficient algorithms, statistical machine learning methods have become increasingly practical for dealing with the deluge of data generated by these systems. In this talk, I present statistical diagnostic platforms for several large-scale systems, focusing on the problem of selecting fault-related components from a long list of potential candidates. Examples include a distributed software monitoring system for automatic debugging, and a probing system for detecting failures on clusters of network computers.
Other Videos By Microsoft Research
Tags:
microsoft research