If you find yourself in San Francisco walking nearby Market Street, you should consider stopping at the Jewish Museum. There is a charming Pastrami place just next to it. It is a sandwich place with good lemonade. It only takes 3-4 minutes to get your meal, and from there it takes no more than 15 minutes walking to be in front of the Ocean. Very nice! Now, let’s consider this other scenario. It is lunchtime, and you are starving. You rush outside your office, and you run to the Pastrami place close to the Jewish Museum. After 35 minutes of wait, you get your sandwich and start eating it asking yourself: why it took so long this time? Shall I probably have walked to the next place to get a faster meal?

Something similar can happen to your Services as well! And that’s precisely the phenomena in computer science we try to capture using the concepts of server time and response time. Server time aims to measures how much a server takes to run a specific action. Let’s say consider an example operation the generation of a monthly report: it usually takes 2ms, but if a lot of customers require the same kind of report at the same time and your system saturates? This situation might very quickly end up in having a subset of them getting the report in more than 1 minute or actually in the timeout of the operation. The time it takes for a customer to get his report is what is typically called response time.

How can we measure these metrics?

The answer to this question is not easy: it depends on your architecture and system. The starting point is instrumenting your application to determine how much time it gets to produce the report. Stress testing is the other important aspect: generating some load on your application and sampling the average response time will let you estimate the application’s service time. Notice that to make this measurement the app should NOT soak during this test!

If you control all the chain (from the HTTP app that sends the request to the server), you can trace the request and simulate the same behavior of your customers. If you can’t do this, you can consider using the frontend edge, probably a load balancer.

I would rather have questions that can’t be answered than answers that can’t be questioned. Richard Feynman

Why does it matter

How many nodes do I need to deploy to accommodate x number of requests per second? When should I consider scaling out my application? How does scale-out affect the customer experience? This is precisely why server time and response time matters! Having an average response time close to the defined service time is a signal of proper utilization and health of an application because it indicates that the response latency is under control and it is far from saturation. Bringing to the limit these two signals, in addition, is a key metric to estimate the correct sizing of the applications instances and infrastructure.

Btw the Pastrami place exists! You should try it! I will be in SF in 2 weeks. So let me know about other places @gianarb. Picture from GMaps. I will take a better one!