Here’s a real life tale of how we used Application Monitoring, Observability and Graceful Degradation to be able to ship
fast but also catch and fix mistakes without letting our users down.
In it we take a look at safe failure states and complementing metrics with supporting data and how we use them to solve real issues.
Observing what happens when your users interact with your software keep you from disaster, allowing your users to keep working and you to keep shipping.
At Tes we capture what happens when our users interact with our services. We set expectations on outcomes. This means we know when our users can’t reach their goals. It also means we can act fast to fix problems.
In this blog post I’ll show how alerts help with this and how we make this a team-focused activity.
If you want to be confident that your users are able to achieve their goals using your service there’s more to do than monitoring the health of individual microservices.
You need assurance that your set of microservices are working well together, and when they aren’t, you need the information necessary to fix any problems as soon as you can.
This blog follows one Tes team’s mission to better identify and diagnose problems, enabling them to move fast and ship with confidence.
A friend of mine tells a great story of a team avoiding a great deal of grief. All of their system health checks were green, but the live graph of purchases dropped to zero and stayed there. Despite the many positive system indicators, the team were able to see they had a problem and were able able to react quickly to find and to fix it.
It turned out that user purchases was a key indicator of success.