Unearthing Concurrency Bugs in Cloud-Scale Distributed Systems

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=UKn5sJkIxYE



Duration: 1:01:52
463 views
4


Users demand for 24/7 dependability of cloud services. Unfulfilled dependability is costly, yet, there are complex challenges to reach an ideal dependability. Behind cloud computing is a collection of hundreds of complex systems written in millions of lines of code that are brittle and prone to failures. In this talk, I am discussing about one of unsolved problems in distributed systems, "distributed concurrency bugs". Distributed concurrency bugs are caused by nondeterministic orders of distributed events such as message arrivals, crashes, and reboots. I am presenting my insight I gain from our bug study, which can help many research on bug combating. And I am presenting my effort to advance distributed system model checker to unearth hidden bugs in systems. I am proposing a principle of semantic awareness to tackle the major problem of model checker, "state space explosion". In this work, I am showing that leveraging semantic knowledge of systems under test can help model checker finds bugs 2x - 340x faster than state of the art.

See more on this video at https://www.microsoft.com/en-us/research/video/unearthing-concurrency-bugs-cloud-scale-distributed-systems/







Tags:
microsoft research