An alert fires indicating that a services latency has increased. How do you start to understand the problem and get to the source of the latency? Latency is a way for us to describe work. It is an observation about how long (duration) an operation takes. In order to observe the latency of an operation there has to be an operation, making the operation the foundational in understanding time.

An increase in latency indicates that there is a constraint on the system. System performance is a well understood and studied field and can usually be modeled using Queuing Theory. Since traffic is what drives the load on a system it’s an intuitive place to start looking. This methodology uses traffic to understand the driver of increased latency in a system:

Increase in the Amount of Work Being Done

The first cause most likely cause of latency is an increase in the amount of work being applied to the system. When the rate of work increasing it puts additionally load on the system. When the system is at or nears capacity work begins to queue resulting in longer latencies. If the resource constrained is scalable, than one solution for this is to scale out.

This is a classic up-and-to-the-right-growth and indicates more work is be applied to the system. The graph above shows a 50% increase in requests in a couple seconds time frame. If there is a correlated increase in latency it suggests that the system is constrained somewhere.

Increased in the Type of Work Being Done

The next most common type of load that corresponds to an increase in latency is the type of work being requested of the system has shifted. There may or may not have a corresponding overall rate of work in the system. The first graph shows the “baseline” system taking requests @ 100 req/s:

The next graph shows the type of work shifting:

In the graph above the overall rate of requests does not increase, but the distribution of the type of requests has. The system is being requested to perform 30% more slow operations (compared to the first image).

Change in the amount of work being done in each transaction

If the rate of work hasn’t increased and the type of work hasn’t changed it’s likely that the amount of work being done has increased. This often manifests as larger payloads (which need parsing) or payloads size increased (resulting in longer loops, or more queries being performed) or more data is being scanned in the database, etc.

This is the most ambiguous case because many systems have poor coverage of payload size, database operations, or other metrics that help to characterize the amount of work a transaction is performing. This is the most insidious because often the stats aren’t measured which fly under radar.

As mentioned above, common causes for this are increase in payloads, ie imagine parsing larger payloads, or and increase in the work being requested, ie # of database queries, or an index changed, or more results are being pulled back.

The graph below shows an example of this in practice:

The rate and type of work hasn’t changed but operational latency has. At this point it’s helpful to check the latency between the service and each one of its dependencies in order to pinpoint which (if any) are the root of the latency. If none of the dependencies has a corresponding increase in latency, the debug space has been partitioned and it’s safe to assume that the cause of latency is originating somewhere in the service itself (probably for one of the reasons listed above).

Conclusion

Anecdotally, whenever I respond to latency issues these are the first things I check (in this order) and almost always uncovers the driver of the latency:

Is there an increase in the amount of work?

Is there an increase in a type of work being done?

Is the amount of work being done per transaction changed?

References