TL;DR: in this post I present to you JMnemohistosyne, a small utility to calculate the histogram of classes with the memory consumed in a code section.

In my last two articles I had a special concern for knowing what was the memory consumption of some data structures and how the occupied space was distributed among different classes that form them.

In my specific case, given a piece of code I needed to find out:

How much memory has been consumed by objects that have been instantiated and remain resident in memory?

How many instances of each class exist?

How much memory do they consume?

In this piece of code you can create objects temporarily (iterators, streams, helpers, etc) that can end up in the heap, but should be ignored because they do not belong to the objects you want to inspect.

Which functionality or tool of all the available ones in the JVM can we use?

To obtain the memory consumption in bytes, the JVM provides us with the JMX object for memory control: MemoryMXBean, and one of its methods returns the number of bytes in use in the heap. Consulting the space consumed before and after executing our code we will be able to calculate the memory used in its execution:

This code has a problem: the getHeapMemoryUsage method returns the used heap space, with all kinds of objects: the living and the unreferenced, which need to be released by the garbage collector.

Therefore it is necessary to invoke garbage collection: System.gc() before each invocation to getHeapMemoryUsage() method to make sure that only the space that is occupied by living objects is counted.

The execution of a garbage collection at that moment is not guaranteed and is only an indication to the JVM that we want to execute it, but to simplify the explanation I will not go into how to solve it.

With this, we get the difference of memory consumed before and after the code we want to inspect, but we have no information about which classes are or how that consumption is distributed among the classes.

Memory Histogram

The JVM allows to get the histogram of the memory consumed by a running program using the commands jmap or jcmd:

$ jmap -histo:live PID

$ jcmd PID GC.class_histogram

where PID is the process identifier of the running application.

Both commands are equivalent and generate the same result in a “hello world”:

num #instances #bytes class name

---------------------------------------------

1: 512 367920 [B

2: 2754 252224 [C

3: 857 99896 java.lang.Class

4: 2736 65664 java.lang.String

5: 792 43872 [Ljava.lang.Object;

6: 675 41600 [I

7: 454 18160 java.util.LinkedHashMap$Entry

8: 471 15072 java.util.Hashtable$Entry

9: 458 14656 java.util.HashMap$Node

10: 127 14552 [Ljava.util.HashMap$Node;

11: 341 10912 sun.misc.FDBigInteger

12: 173 6920 java.lang.ref.Finalizer

13: 201 6432 java.util.concurrent.ConcurrentHashMap$Node

14: 256 6144 java.lang.Long

15: 85 6120 java.lang.reflect.Field

16: 126 6048 java.util.HashMap

17: 69 5520 java.lang.reflect.Constructor

18 80 5120 java.net.URL

19: 160 5120 sun.security.util.ObjectIdentifier

20: 118 4720 java.lang.ref.SoftReference

... ... .....

In this way we can get the number of instances of each class, and the total memory space used by the objects of each class.

In the jmap command, thanks to the :live modifier, we can request to the JVM to only take into account living objects and force the execution of the garbage collector. While jmcd , with GC.class_histogram parameter, it is implicit to run garbage collection.

This is very useful to get an idea of the status of your application at any time, and look for very high and anomalous values to find a memory leak.

By calculating the difference between two histograms taken at two different moments we can obtain the increase (or decrease) of instances and memory in that period of time.

Because the instructions are executed in the command line we have no control over the exact moment when each histogram is executed.

To solve it I have chosen to invoke the command from code using the exec method from the Runtime object.

So, if we parse the standard output of the command and calculate the difference between the values associated with each class we can get the needed information.

How can we do this in a simple way? I didn’t find any and had to write a code that would parse and calculate the difference.

I have cleaned and published in Maven Central the little code that came out, so that you can use it when you have a similar requirement: https://github.com/jerolba/jmnemohistosyne

Dependency

If you use Maven, all you have to do is add this dependency:

<dependency>

<groupId>com.jerolba</groupId>

<artifactId>jmnemohistosyne</artifactId>

<version>0.2.3</version>

</dependency>

And if you have Gradle:

implementation 'com.jerolba:jmnemohistosyne:0.2.3'

The library has a size of 11KB and does not have any other transitive dependency.

Simple histogram

To get a histogram of all objects in memory you create a Histogramer object and call to the createHistogram() method, which will return a MemoryHistogram :

MemoryHistogram class is an iterable collection of HistogramEntry objects containing: class name, number of instances and total size of all instances.

You can also search for a particular class by passing its full name to the get method.

Histogram of new created objects

We can measure the number of living objects instantiated within a section of code using a lambda that contains it:

You can put all the code you want and call all the needed code inside the lambda, but be sure that all the instances that you want to be taken into account are referenced by some object that already existed outside the lambda or are referenced by the object that returns the lambda (that implements Supplier<?> ). Otherwise, by forcing the garbage collection the instances will not appear in the histogram and will be lost in memory, like tears in rain.

In the following example only one instance of the ArrayList class, another instance of Object[] and 10,000 of String will appear, but there will be no trace of the classes that compose HashMap :

The MemoryHistogram collection can be filtered using the filter method which supports an array of variable arguments that can be:

The full name of a class (including the package): java.util.HashMap

The full name of a class using a * at the end to indicate that starts with: java.util.HashMap*

at the end to indicate that starts with: An instance of Class : HashMap.class

: A regular expression to apply over the full class name: Pattern.compile(".*List") .

All options apply to the histogram and are joined as an OR in a new MemoryHistogram .

A complete example would be:

Because the number of classes present in the JVM can be very high, you can get the Top N of the most memory-consuming classes:

System.out.println(diff.getTop(20));

Recommendations for use

Do not use in production!!

To be able to do all the measurements the JVM does a complete garbage collection, and it has to go through the whole heap counting instances of classes and taking note of their size.

This is a very expensive operation and can completely block your process for several seconds (a time proportional to consumed memory).

If you can’t run it in a unit/integration test, don’t go beyond a pre-production environment.

Do not execute in concurrency

If you are in a concurrent environment such as a web server, along with your code, you may be running another code from other requests that may “pollute” the memory and give you unexpected results.

Make sure there’s nothing else running and that only the code you’re interested in is running.

The JVM itself runs concurrently

Even if it seems to you that you are running only one thread and you don’t have concurrency, the JVM is behind doing its stuff, and as much of the JVM code is written in Java (more now if you activate Graal), its classes will appear in the histogram.

In my experience what appears most are classes related to the classloader (the classes you measure have to be loaded at some point), so if you need precision in the number of instances and their consumption, I would recommend you to pre-execute your code before taking any sample:

in this way we will ensure that everything necessary to load and execute your code has already been done before.

Before accepting as valid the first result you get, be critical if any value doesn’t seem proper, and try to run your code in a different context.

It needs access to the JDK

The command jcmd is part of the utilities that are in the JDK, and you need to have a JDK installed, and its executables must be accessible in the path.

Currently it is tested in the CI system against Oracle JDK 8, OpenJDK 8 and OpendJDK 11. If you find a version that doesn’t work, PRs are accepted :)