DevOpsDays NYC 2020: Chen Harel - Reliability Scoring: A 3-Part Formula for Promoting Reliable Code

Channel:
Subscribers:
42,400
Published on ● Video Link: https://www.youtube.com/watch?v=E437g4TFbn8



Duration: 26:01
4 views
0


The expression “separate the signal from the noise” comes up early and often in monitoring, but when our systems continue to get louder and louder – more errors, more alerts, more logs to sift through – finding the one anomaly that actually matters to the reliability of your application can be like finding a needle in a stack of needles.

This was a problem my team encountered when monitoring our own systems. We were capturing tons of data about billions of events happening every day, without a quantifiable way of knowing what to fix first and how reliable our releases and applications actually were. So our R&D team set out to devise a formula for not only clearly defining what constitutes an anomaly in our code, but also for helping to prioritize the issues that actually matter and actually blocking them from making it to production in the first place.

This session will walk attendees through our open source reliability scoring system in Grafana and a variety of CI/CD tools that provides DevOps teams with a formulaic approach to prioritizing anomalies and understanding the stability of their releases – before they go to production.

The formula scores releases for stability and safety based on if it: – Introduces new errors – Makes existing errors happen at a higher rate (i.e. rate increase) – Introduces slowdowns into the environment

The talk will walk through how to follow the below process and will include an example of the process being applied in practice: – Detect all errors and slowdowns – Classify each event – Prioritize by severity – Score the release – Block release – Visualize the data




Other Videos By Confreaks


2022-09-20DevOpsDays NYC 2020: Raffles and Prizes
2022-09-20DevOpsDays NYC 2020: Jessica Fredican - Product Management for Platform Engineering: Why it ...
2022-09-20DevOpsDays NYC 2020: Tom Elliott - Keeping Calm When the Sky is Falling: Reducing Stress around ...
2022-09-20DevOpsDays NYC 2020: Dave Stanke - All Tech is Debt
2022-09-20DevOpsDays NYC 2020: Mofizur Rahman - You Probably Don't Need Kubernetes
2022-09-20DevOpsDays NYC 2020: Quintessence Anx - Unquantified Serendipity: Diversity in Development
2022-09-20DevOpsDays NYC 2020: Sheanika Crawford - Why Events & People Matter: How the new girl used ...
2022-09-20DevOpsDays NYC 2020: James Meichle - Cooperative Economics for Engineers; or, Why You Have ...
2022-09-20DevOpsDays NYC 2020: Sponsors - Day 2
2022-09-20DevOpsDays NYC 2020: Kris Buytaert - 10 years of #devops, but what did we really learn?
2022-09-20DevOpsDays NYC 2020: Chen Harel - Reliability Scoring: A 3-Part Formula for Promoting Reliable Code
2022-09-20DevOpsDays NYC 2020: Jameson Hampton - Lessons in Ethical Development I Learned From Star Wars
2022-09-20DevOpsDays NYC 2020: Sposors - Day 1
2022-09-20DevOpsDays NYC 2020: Angel Rivera - CI/CD Agility and Controlling Pipeline Sprawl
2022-09-20DevOpsDays NYC 2020: John Allspaw - Can Resilience Engineering be sufficiently described in 5...?
2022-09-20DevOpsDays NYC 2020: Victoria Geronimo - DevSecOps is a Misnomer
2022-09-20DevOpsDays NYC 2020: Michael Wytock - Infrastructure Changes, So Modularize and Version Your...
2022-09-20DevOpsDays NYC 2020: Quincy Iheme - Don't Fail Fast. Learn Faster
2022-09-20DevOpsDays NYC 2020: Rachael Ferguson - Productionalizing Data Science Models
2022-09-20DevOpsDays NYC 2020: Christine Yen - Observability for Developers: How to Get From Here to There
2022-09-20DevOpsDays NYC 2020: Welcome and Intro