The Two Metrics You Need

When interviewing candidates for Instacart’s first site reliability engineer, I volunteered to cover monitoring as one of my topics. I’d start by asking “What metrics should we be monitoring?”

One candidate gave an answer that astounded me. He said,

There are only two things I care about: errors and latency

More specifically:

Both must be measured at the load balancer. Errors include those generated by the application and by the load balancer.

Place alerts on these metrics to detect problems with the health of your site. It is significantly more effective than relying on services which monitor a few endpoints (you should do this as well).

Here’s how to get them on a few services.

Amazon ELB

CloudWatch gives them for free

Heroku

Add Librato.

heroku addons:create librato:development

Published April 30, 2015


You might also enjoy

Error Reporting in R

Just Table It

Git LFS on Heroku


All code examples are public domain.
Use them however you’d like (licensed under CC0).