This post describes how to enable Class Data Sharing (CDS) for a java app and what are the benefits of doing so.

I’ve recently seen a talk about CDS by Volker Simonis which was the inspiration to write this blogpost.

Table of Contents

What is Class Data Sharing?

Class Data Sharing is a JVM feature, which allows multiple JVMs to share loaded classes (and some other things) via shared memory.

A basic CDS was available with Sun JVM since java 1.5. It was limited to system classes and serial GC only, so not widely applicable. In Oracle JDK 9 this finally became useful, with support of other GC algorithms and application classes. It was a commercial feature until JDK 10, where it became opensource.

CDS creates a memory mapped file, caching the internal representation of specified classes. This file (and its memory) can therefore be shared between JVMs, plus the loading of classes from the internal representation is faster than from a regular jar file.

It is worth noting that IBM Open J9 had a similar feature for a long time.

In this blog post we’ll investigate how to prepare such a CDS archive and look at potential benefits. We’ll use elasticsearch docker image to conduct the experiment.

Prepare CDS

All of the code presented here is posted on github.

Intro — we need OpenJDK 10

Let’s use elasticsearch docker image to test the impact of class data sharing.

First, we need to repackage it with openjdk 10.

# Dockerfile-openjdk10 FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4 COPY pkg/ /app/pkg RUN cd /app/pkg && \ tar -zxf openjdk-10.0.1_linux-x64_bin.tar.gz && \ mkdir -p /opt/java && \ mv jdk-10.0.1 /opt/java ENV JAVA_HOME /opt/java/jdk-10.0.1

Let’s create a Makefile, we’ll add more to it as we go:

# Makefile .PHONY: help help: @ echo "# Elasticsearch and Class Data Sharing (CDS) experiment" @ echo @ echo "build-jdk10 - repackage elasticsearch-oss with openjdk10" @ echo @ echo "clean-cache - nuke the class list and cache" @ echo "generate-class-list - list classes used by elasticsearch" @ echo "dump-class-cache - dump the classes from the list to cache file" @ echo "build-cds - repackge the class cache in the elasticsearch docker container" @ echo @ echo "run-{cds/nocds} - run elasticsearch with/without class data sharing" @ echo pkg/openjdk-10.0.1_linux-x64_bin.tar.gz: @ wget -P pkg https://download.java.net/java/GA/jdk10/10.0.1/fb4372174a714e6b8c52526dc134031e/10/openjdk-10.0.1_linux-x64_bin.tar.gz pkg: pkg/openjdk-10.0.1_linux-x64_bin.tar.gz .PHONY: build-jdk10 build-jdk10: pkg @ docker build -f Dockerfile-openjdk10 -t "ikupczynski/elasticsearch-oss:6.2.4-openjdk10" .

And now we can get our repackaged image:

make build-jdk10

Create a list of classes used by Elasticsearch

We’re going to start elastics, make it respond to a simple request and let it log all the classes it uses to a file -XX:DumpLoadedClassList=/app/cache/elasticsearch_appcds . We need to specifically enable the support for application classes with -XX:+UseAppCDS , otherwise will limit ourselves to jdk classes only.

CWD := $(shell pwd) .PHONY: clean-cache clean-cache: @ rm -rf cache/elasticsearch_appcds* @ echo "Nuked cache" cache/elasticsearch_appcds.cls: @ touch cache/elasticsearch_appcds.cls @ export ES_JAVA_OPTS="-XX:+UseAppCDS \ -XX:DumpLoadedClassList=/app/cache/elasticsearch_appcds.cls" && \ docker run \ -d --name generate-class-list \ -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -v $(CWD)/cache:/app/cache \ --env ES_JAVA_OPTS \ -it "ikupczynski/elasticsearch-oss:6.2.4-openjdk10" @ echo "Waiting until elasticsearch starts" @ bin/wait-on-elastic @ docker rm -f generate-class-list @ echo "Class list generated. Number of classes: " @ wc -l cache/elasticsearch_appcds.cls

We start elasticsearch in the background ( docker run -d ) and wait until it starts responding to requests with a following script:

until $(curl --output /dev/null --silent --head --fail localhost:9200); do printf '.' sleep 1 done

If we run make cache/elasticsearch_appcds.cls we’ll get a list of approximately 7k classes used by elasticsearch.

$ make cache/elasticsearch_appcds.cls e2afbb966e1ad64d1296ada7c67a1038301834f8419aafa419ebbf74c7cd499b Waiting until elasticsearch starts ...........generate-class-list Class list generated. Number of classes: 7111 cache/elasticsearch_appcds.cls $ head -n 5 cache/elasticsearch_appcds.cls java/lang/Object java/lang/String java/io/Serializable java/lang/Comparable java/lang/CharSequence $ tail -n 5 cache/elasticsearch_appcds.cls org/elasticsearch/common/io/Streams$FlushOnCloseOutputStream org/elasticsearch/Version$DeclaredVersionsHolder com/fasterxml/jackson/core/io/JsonStringEncoder com/fasterxml/jackson/core/util/ByteArrayBuilder org/elasticsearch/common/bytes/PagedBytesReference$1

Note that the list is generate while your app is running, so if you load any classes manually later on, you should exercise this codepath. In our case we just wait until elasticsearch can respond to a simple rest request. This is not perfect, but good enough for our experiment.

JVM error

Normally, the next step would be to take the elasticsearch_appcds class list and use it to populate the cache file, however I’ve hit some limit (either a JVM bug, or more likely, some memory limits or docker-jvm interplay) and I needed to trim down the class list by 1/7 to around 6k. See error.org for more details.

I leave the root cause investigation for some later time. Now, let’s just trim the class list.

# Workaround of the JVM error cache/elasticsearch_appcds.cls-thin: cache/elasticsearch_appcds.cls @ head -n 6218 cache/elasticsearch_appcds.cls > cache/elasticsearch_appcds.cls-thin @ echo "Filtered the class list. Number of classes: " @ wc -l cache/elasticsearch_appcds.cls-thin generate-class-list: cache/elasticsearch_appcds.cls-thin

Prepopulate the class cache

Now that we have the class list, let’s use it to pre-populate the class cache. Contrary to the previous step, here the jvm won’t run the app. It will start a process to go through the class list, load each of the classes and dump them to the cache file and stop the jvm.

cache/elasticsearch_appcds.jsa: generate-class-list @ touch cache/elasticsearch_appcds.jsa @ export ES_JAVA_OPTS="-Xshare:dump \ -XX:+UseAppCDS \ -XX:SharedClassListFile=/app/cache/elasticsearch_appcds.cls-thin \ -XX:+UnlockDiagnosticVMOptions \ -XX:SharedArchiveFile=/app/cache/elasticsearch_appcds.jsa" && \ docker run \ --rm --name dump-class-cache \ -e "discovery.type=single-node" \ -v $(CWD)/cache:/app/cache \ --env ES_JAVA_OPTS \ -it "ikupczynski/elasticsearch-oss:6.2.4-openjdk10" dump-class-cache: generate-class-list cache/elasticsearch_appcds.jsa

Let’s try it out.

$ make dump-class-cache OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release. narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3 Allocated temporary class space: 1073741824 bytes at 0x00000008c0000000 Allocated shared space: 3221225472 bytes at 0x0000000800000000 Loading classes to share ... (...) Loading classes to share: done. Rewriting and linking classes ... Preload Warning: Removed error class: org.apache.logging.log4j.core.async.AsyncLoggerContext Rewriting and linking classes: done Number of classes 7211 instance classes = 7087 obj array classes = 116 type array classes = 8 Updating ConstMethods ... done. Removing unshareable information ... done. Scanning all metaspace objects ... Allocating RW objects ... Allocating RO objects ... Relocating embedded pointers ... Relocating external roots ... Dumping symbol table ... Relocating SystemDictionary::_well_known_klasses[] ... (...) Removing java_mirror ... done. mc space: 18184 [ 0.0% of total] out of 20480 bytes [ 88.8% used] at 0x0000000800000000 rw space: 18630320 [ 22.6% of total] out of 18632704 bytes [100.0% used] at 0x0000000800005000 ro space: 31856600 [ 38.6% of total] out of 31858688 bytes [100.0% used] at 0x00000008011ca000 md space: 6160 [ 0.0% of total] out of 8192 bytes [ 75.2% used] at 0x000000080302c000 od space: 31914104 [ 38.7% of total] out of 31916032 bytes [100.0% used] at 0x000000080302e000 total : 82425368 [100.0% of total] out of 82436096 bytes [100.0% used]

And the resulting file is 80Mb of pre-loaded classes:

$ ls -lah cache/elasticsearch_appcds.jsa -r--r--r-- 1 igor staff 79M May 20 20:52 cache/elasticsearch_appcds.jsa

Package it with the container

We have the class cache, now we can package it with the container.

Here is the dockerfile:

# Dockerfile-cds FROM ikupczynski/elasticsearch-oss:6.2.4-openjdk10 COPY cache/ /app/cache

As you can see it is quite simple, it just copies the cache to the container.

And the complimentary Makefile addition:

CDS_IMAGE = ikupczynski/elasticsearch-oss:6.2.4-cds .PHONY: build-cds build-cds: dump-class-cache @ docker build -f Dockerfile-cds -t $(CDS_IMAGE) .

I’ve build and pushed the image to dockerhub, so you can carry on from here with

docker pull ikupczynski/elasticsearch-oss:6.2.4-cds

Convenience targets to run elasticsearch

Let’s define some make targets to let us run elasticsearch both with and without CDS.

RUN_NO_CDS = export ES_JAVA_OPTS="-Xshare:off \ -Xlog:class+load:file=/app/logs/classload-nocds.log " && \ docker run \ -e "discovery.type=single-node" \ -v $(CWD)/cache:/app/cache \ -v $(CWD)/logs:/app/logs \ --env ES_JAVA_OPTS \ -it .PHONY: clean-nocds-logs clean-nocds-logs: @ rm -f logs/classload-nocds.log* .PHONY: run-nocds run-nocds: clean-nocds-logs @ $(RUN_NO_CDS) --rm $(CDS_IMAGE) # `time-nocds` given mostly for illustration, it is not acurate .PHONY: time-nocds time-nocds: @ $(RUN_NO_CDS) -p 9200:9200 --name run-no-cds -d $(CDS_IMAGE) @ echo "Timing the wait on elastic" @ time bin/wait-on-elastic @ docker rm -f run-no-cds RUN_CDS = export ES_JAVA_OPTS="-Xshare:on \ -Xlog:class+load:file=/app/logs/classload-cds.log \ -XX:+UseAppCDS \ -XX:SharedClassListFile=/app/cache/elasticsearch_appcds.cls-thin \ -XX:+UnlockDiagnosticVMOptions \ -XX:SharedArchiveFile=/app/cache/elasticsearch_appcds.jsa" && \ docker run \ -e "discovery.type=single-node" \ -v $(CWD)/cache:/app/cache \ -v $(CWD)/logs:/app/logs \ --env ES_JAVA_OPTS \ -it .PHONY: clean-cds-logs clean-cds-logs: @ rm -f logs/classload-cds.log* .PHONY: run-cds run-cds: @ $(RUN_CDS) --rm $(CDS_IMAGE) # `time-cds` given mostly for illustration, it is not acurate .PHONY: time-cds time-cds: @ $(RUN_CDS) -p 9200:9200 --name run-cds -d $(CDS_IMAGE) @ echo "Timing the wait on elastic" @ time bin/wait-on-elastic @ docker rm -f run-cds

Now we can run elasticsearch with or without CDS:

make run-cds make run-nocds

Experiment

Let’s see how the classes are loaded

We can run both versions in the terminal, and then inspect logs:

$ make time-nocds # ... $ make time-cds # ... $ grep 'org.elasticsearch.bootstrap.Bootstrap ' logs/* logs/classload-cds.log:[1.370s][info][class,load] org.elasticsearch.bootstrap.Bootstrap source: shared objects file logs/classload-nocds.log:[2.696s][info][class,load] org.elasticsearch.bootstrap.Bootstrap source: file:/usr/share/elasticsearch/lib/elasticsearch-6.2.4.jar

As we can see, with the class data sharing, we’ve loaded the main class from shared objects file , which means the class cache we’ve created. In case of nocds the file is loaded from the jar. All as expected.

We also see that the cds version loaded the main class over a second faster, and this number was fairly consistent across various runs I did to write this post. I wouldn’t trust it, as it wasn’t a rigorous benchmark — I run on a dev laptop with many variables to account for. If you are in a business of (re-)starting java apps often, generate a class cache and try to compare startup times for you use case / app.

Let’s also see how many classes where actually loaded from cache.

$ grep -c 'shared objects file' logs/classload-cds.log 2028

Memory usage

Let’s start with no class data sharing:

$ make run-nocds # 4 times, different terminals # (...) $ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 371268a8263b vigilant_neumann 0.91% 1.168GiB / 7.787GiB 15.00% 858B / 0B 0B / 254kB 28 5137554b5e86 pensive_kepler 0.16% 1.164GiB / 7.787GiB 14.95% 858B / 0B 0B / 254kB 28 9fd0a493d7a2 vibrant_albattani 0.18% 1.174GiB / 7.787GiB 15.08% 858B / 0B 0B / 254kB 28 5cfb3f1d7a8e vibrant_shtern 0.20% 1.17GiB / 7.787GiB 15.03% 1.04kB / 0B 8.19kB / 254kB 28

Reported mem usage: 1.164 , 1.168 , 1.170 , 1.174 [GiB]

And with class data sharing

$ make run-cds # 4 times # (...) $ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 4e29db4f3dca inspiring_volhard 0.97% 1.154GiB / 7.787GiB 14.82% 718B / 0B 8.19kB / 254kB 28 2841355336be awesome_shockley 0.31% 1.153GiB / 7.787GiB 14.81% 788B / 0B 0B / 254kB 28 1fe95a004665 naughty_goldberg 0.39% 1.171GiB / 7.787GiB 15.04% 788B / 0B 0B / 254kB 28 513955f2d550 compassionate_montalcini 0.27% 1.152GiB / 7.787GiB 14.80% 968B / 0B 0B / 254kB 28

Reported mem usage: 1.152 , 1.153 , 1.154 , 1.171 [GiB]

Again, not really a benchmark, but we can see a 5--20 MiB improvement.

Recap

To create a class data archive, we need to:

Run the application, while dumping the class list -XX:+UseAppCDS -XX:DumpLoadedClassList=/class-list.cls Generate the archive file from this list -Xshare:dump \ -XX:+UseAppCDS \ -XX:SharedClassListFile=/class-list.cls \ -XX:+UnlockDiagnosticVMOptions \ -XX:SharedArchiveFile=/archive.jsa Run the app with the archive -Xshare:on \ -XX:+UseAppCDS \ -XX:SharedClassListFile=/class-list.cls \ -XX:+UnlockDiagnosticVMOptions \ -XX:SharedArchiveFile=/archive.jsa

The internal representation may be architecture dependent, so should be generated on an env close to the target.

As a result of that, few dozen megabytes worth of java classes can be shared across JVMs; plus we’ll shave a few seconds from the startup time. In case of elasticsearch, these are not huge savings. On the other hand, if you run multiple copies of your app, or this is a script, where lower startup time improves the UX greatly, you should consider using Class Data Sharing. Coincidentally, if you run a serverless platform, both of these conditions apply.

Conclusions

Class data sharing allows for shorter startup times and lower memory usage (if running multiple instances). Volker Simonis reports 30% statup time improvement for JRuby. If you need to start your app often (e.g. scripts) or run multiple copies of it (orchestration, serverless), and you are OpenJDK 10 compatible (for application class sharing) give it a try.

Ideas for future posts