In a lot of enterprises, performance tests are conducted regularly. As part of these tests, QA teams gather various metrics and publish them in a performance report. Metrics reported in the performance report typically tend to be:

CPU utilization.

Memory utilization.

The response time of key transactions.

The response time of backend systems.

Network bandwidth (some organizations).

I would like to categorize the above-mentioned metrics as macro metrics. Macro metrics are great; however, they have two main short-comings:

Performance problems not caught in the test environment. Despite a number of performance tests conducted in the test environment performance degradation still finds its way to production. In a test environment, you will notice acute performance degradation, but the above-mentioned macro metrics don’t capture them. These acute degradations are the ones that, when unnoticed, manifest as major performance problems in production. However, micro-level metrics discussed below brings visibility to these degradations. Doesn’t facilitate in troubleshooting. Macro metrics, to a major extent, do not facilitate the development team to debug and troubleshoot problems. Say that macro metrics indicate CPU consumption is high — there will be no indication whether CPU consumption increased because of heavy garbage collection activity or because of a thread looping problem or because of some other coding issue. Similarly, if there is degradation in the response time, it won’t indicate whether degradation is because of the locks in the application code or a backend connectivity issue.

Macro metrics should be complemented with micro metrics to address the above-mentioned shortcomings. In this article, I have listed 10 micro metrics, which you may consider adding to your performance reports.

Memory-Related Micro Metrics

Here are four memory-related micro metrics:

Garbage collection pause times. Object creation/reclamation rate. Garbage collection throughput. Memory consumption by each generation.

Let’s review them in detail:

1. Garbage Collection Pause Times

You should measure garbage collection pause times, as if GC pauses, the entire application freezes. No customer activity will be performed. It would have direct customer impact. One should always aim for low pause time.

2. Object Creation/Reclamation Rate

The rate at which objects are created heavily influences CPU utilization. If inefficient data structures or code are used, then more objects will be generated to process the same number of transactions. A high object creation rate translates to frequent garbage collection (GC). Frequent GC translates to increased CPU consumption.

3. Garbage Collection Throughput

Throughput is basically the amount of time your application spends in processing customer transactions vs. the amount of time it spends in doing garbage collection activities. One should target for high throughput (i.e. the application should spend more time processing customer transactions and very less time with garbage collection activities).

4. Memory Consumption by Each Generation

In JVM (Java Virtual Machine), ART (Android Runtime), and other platforms, memory is divided into a few internal regions. You need to know the allocated size and peak utilization size of each region. Under-allocation of internal memory regions will degrade the performance of the application. Over-allocation will increase the bill from your hosting provider.

GC KPI generated by GCeasy.io.

How to Source Memory-Related Micro Metrics

All the memory related micro metrics can be captured from the Garbage Collection logs. Here are instructions on how to turn ON the Garbage collection logs. You can use free online GC log analyzer tool like GCeasy.io, which will report all of the above memory related micro metrics in a visual/graphical format.

Thread-Related Micro Metrics

Here are four thread-related micro metrics:

5. Thread states.

6. Thread groups.

7. Daemon vs. non-daemon.

8. Code execution path.

Let’s review the importance of these metrics below:

Thread group metric generated by fastThread.io.

5. Thread States

Threads can be in one of the following states: NEW , BLOCKED , RUNNABLE , WAITING , TIMED_WAITING , or TERMINATED . Threads count by each state should be reported. If threads are in a BLOCKED state for a prolonged period, then the application can become unresponsive. If there are a lot of threads in a RUNNABLE state, then the application’s CPU consumption will become high. If application threads are spending more time in WAITING , TIMED_WAITING or BLOCKED states, then response time will degrade.

6. Thread Groups

A thread group represents a set of threads. Each application has multiple thread groups. You should measure the size of each thread group and report it. An increase in thread group size might indicate a certain type of performance degradation.

7. Daemon vs. Non-Daemon

There are two types of thread statuses: daemon and non-daemon (i.e. user) threads. You should report threads count by status because when non-daemon threads are running, JVM won’t terminate.

8. Code Execution Path

Your application’s CPU, memory consumption, and response time would differ based on the code execution path. If most of the threads execute a specific code execution path, then that particular code execution should have to be studied in detail.

How to Source Thread-Related Micro Metrics

Thread activity-related metrics can be sourced from thread dumps. Here are eight different options to capture thread dumps. You can use the option that is convenient to you. Once you have captured thread dumps, you can upload them to free online thread dump analysis tools like fastThread.io. This tool provides all of the above thread activity related micro-level metrics.

Network-Related Micro Metrics

Here are 2 network related micro metrics:

9. Outbound connections.

10. Inbound connections.

Let’s review the importance of these metrics below:

9. Outbound Connections

In today’s world, you will seldom see enterprise applications that don’t communicate with other applications. Your application’s performance is heavily dependent on the applications to which it communicates. Measuring the number of ESTABLISHED connections by each end-point should be measured. Any variance in the connection count can influence the performance of the application.

10. Inbound Connections

The application can get traffic from multiple channels — web, mobile, API and multiple protocols; HTTP, HTTPS, JMS, Kafka, etc. You need to measure the number of connections coming from each channel and each protocol as they also influence the performance of the application.

How to Source Network-Related Micro Metrics

Application Performance Monitoring (APM) tools like New Relic and App Dynamics can report this metric, or you can configure custom probes in APM tools to report these metrics. On the other hand, if you aren’t using APM tools, you can also use ‘netstat’ utility: