Get a stream of high-volume data points about your running app.

February 27, 2020

In this article, I discuss a new feature arriving with Java 14. This feature, referred to as JFR Event Streaming (JEP 349), is the latest iteration of a mature set of profiling and monitoring technologies that have a long history.

The original Java Flight Recorder (JFR) and Java Mission Control (JMC) tools were obtained by Oracle as part of the acquisition of BEA Systems back in 2008. The two components work together. JFR is a low-overhead, event-based profiling engine with a high-performance back end for writing events in a binary format, whereas JMC is a GUI tool for examining a data file created by JFR from the telemetry of a single JVM.

The tools were originally part of the tooling offering for BEA’s JRockit JVM and were moved to the commercial version of the Oracle JDK as part of the process of merging JRockit with Java HotSpot VM. After the release of JDK 9, Oracle changed the release model of Java and announced that JFR and JMC would become open source tools.

JFR was contributed to OpenJDK and was delivered in JDK 11 as JEP 328. JMC was spun out into a standalone open source project and exists today as a separate download.

The arrival of Java 14 introduces a new feature to JFR: the ability for JFR to produce a continuous stream of events. This change also provides an API to enable events to be handled immediately, rather than by parsing a file after the fact. This change makes JFR Event Streaming a great foundation on which to build monitoring and observability tools.

One issue, however, is that because JFR and JMC only recently became open source tools, many Java developers are not aware of their considerable capabilities. So before I get into the new Java 14 features, let’s explore JMC and JFR from the beginning.

Introducing JFR

Because JFR first became available as open source as part of OpenJDK 11, you’ll need to be running that version (or a more recent one) or be an Oracle customer running the commercial JDK.

There are various ways to create a JFR recording, but I will look at two in particular: the use of command-line arguments when starting up a JVM and the use of jcmd .

First, let’s see what command-line switches you need to start JFR. The key switch is this:

-XX:StartFlightRecording:<options>

This switch enables JFR recording at process start time. Until Java 14, JFR produced a file of profiling data. This could be presented as either a one-off dump file or a continuous ring buffer. A large number of individual command-line options control what data is being captured.

In addition, JFR can capture more than a hundred different possible metrics. Most of these are very low impact, but some do incur noticeable overhead. Managing the configuration of all these metrics individually is a huge task. Instead, to simplify the process, JFR uses profiling configuration files. These files are simple XML files that contain the configuration for each metric and determine whether the metric should be captured. The standard JDK download contains two basic files: default.jfc and profile.jfc .

The default level of recording, controlled by default.jfc , is designed to be extremely low overhead and to be usable by basically every production Java process. The profile.jfc configuration contains more detailed information but this, of course, comes at a higher runtime cost.

Beyond the two supplied files, it is possible to create a custom configuration file that contains just the data points you want. The JMC tool (described in the next section) has a template manager that enables easy creation of these files.

Other options that can be passed on the command line include the name of the file in which to store the recorded data and how much data to keep (in terms of the age of the data points). For example, an overall JFR command line might look like this:

-XX:StartFlightRecording:disk=true,filename=/sandbox/service.jfr,maxage=12h,settings=profile

This would create a rolling buffer of 12 hours duration containing the in-depth profiling information. There is no stipulation on how big this file could get.

Note: When JFR was a part of the commercial build, it was unlocked with the -XX:+UnlockCommercialFeatures switch. However, Oracle JDK 11 and later versions emit a warning when this option is used. The warning is issued because all the commercial features have been open sourced and because the flag was never part of OpenJDK, it does not make sense to continue to use it. In OpenJDK builds, using the commercial features flag results in an error.

One of the great features of JFR is that it does not need to be configured when the JVM starts up. Instead, JFR can be activated from the command line by using the jcmd program to control a running JVM from the UNIX command line:

$ jcmd <pid> JFR.start name=Recording1 settings=default $ jcmd <pid> JFR.dump filename=recording.jfr $ jcmd <pid> JFR.stop

Not only that, but it is possible to attach to JFR after the application has already started. You can see that JMC (see next section) provides the capability of dynamically controlling JFR within JVMs running on the local machine.

No matter how JFR is activated, the result is the same: a single file per profiling run per JVM. The file contains a lot of binary data and is not human-readable, so you need some sort of tool to extract and visualize the data, such as JMC.

Introducing JMC 7

Use the JMC graphical tool to display the data contained in JFR output files. It is started from the jmc command. JMC used to be bundled with the Oracle JDK download, but it is now available separately.

Figure 1 shows the startup screen for JMC. After loading the file, JMC performs some automated analysis on it to identify any obvious problems present in the recorded run.

Figure 1. The JMC startup screen

Note: To profile, JFR must be enabled on the target application. As well as using a previously created file, JMC provides a tab on the left of the top left panel marked JVM Browser for attaching dynamically to local applications.

One of the first screens that you’ll encounter in JMC is the overview telemetry screen (Figure 2), which shows a high-level dashboard of the overall health of the JVM.

Figure 2. Initial telemetry dashboard

The major subsystems of the JVM all have dedicated screens to enable deep analysis. For example, garbage collection has an overview screen (Figure 3) to show the garbage collection events over the lifetime of the JFR file. The Longest Pause display allows you to see where any anomalously long garbage collection events have occurred over the timeline.

Figure 3. Garbage collection overview

In the detailed profile configuration, it is also possible to see the individual events where new allocation buffers (TLABs) are handed out to application threads (Figure 4). This gives you a much more accurate view of resources within the process.

Figure 4. TLAB allocations

TLAB allocations let you easily see which threads are allocating the most memory; in Figure 4, it’s a thread that is consuming data from Apache Kafka topics.

The other major subsystem of the JVM is the JIT compiler, and as Figure 5 shows, JMC allows you to dig into the details of how the compiler is working.

Figure 5. Watching the compiler

A key resource is the available memory in the JIT compiler’s code cache. A key resource is the available memory in the JIT compiler’s code cache, which stores the compiled version of methods. For processes that have many compiled methods, this area of memory can be exhausted, causing the process to not reach peak performance. Figure 6 shows the data related to this cache.

Figure 6. Monitoring the code cache

JMC also includes a method-level profiler (Figure 7), which works in a very similar way to those found in VisualVM or commercial tools such as JProfiler or YourKit.

Figure 7. Profiling pane

One of the more advanced screens within JMC is the VM Operations view (Figure 8), which shows some of the internal operations the JVM performs and how long they take. This is not a view that you would expect to need for every analysis, but it would be very useful for detecting certain types of problems.

Figure 8. Viewing JVM operations

For example, one of the key data points visualized here is bias revocation time. Most programs thrive with the JVM’s biased locking technology (which is broadly the idea that the first thread to lock an object is likely to be the only one ever to lock it). However, some workloads (such as Apache Cassandra) are architected in such a way that they are negatively affected by biased locking. The VM Operations view would allow you to spot such a case very easily.

Parsing a JFR File Programmatically

Using JMC to examine a file from a single JVM is an important use case for profiling and developing applications. However, in service-based and enterprise environments, the model of using a GUI to investigate files one by one is not always the most convenient approach.

Instead, you might want to build tools that can parse the raw data captured by JFR and provide more convenient ways to access and visualize it. Fortunately, it’s very simple to get started with the programmatic API for JFR events.

The key classes are jdk.jfr.consumer.RecordingFile and jdk.jfr.consumer.RecordedEvent . The former is made up of instances of the latter, and it is easy to simply step through the events in a simple piece of Java code, such as this:

var recordingFile = new RecordingFile(Paths.get(fileName)); while (recordingFile.hasMoreEvents()) { var event = recordingFile.readEvent(); if (event != null) { var details = decodeEvent(event); if (details == null) { // Log a failure to recognize details } else { // Process details System.out.println(details); } } }

Each event may carry slightly different data, so you need to have a decoding step that takes into account what type of data is being recorded. For simplicity in this example, I’m choosing to take in a RecordedEvent and convert it to a simple map of strings:

public Map<String, String> decodeEvent(final RecordedEvent e) { for (var ent : mappers.entrySet()) { if (ent.getKey().test(e)) { return ent.getValue().apply(e); } } return null; }

The real magic happens in the decoders, which have a slightly complex signature:

private static Predicate<RecordedEvent> testMaker(String s) { return e -> e.getEventType().getName().startsWith(s); } private static final Map<Predicate<RecordedEvent>, Function<RecordedEvent, Map<String, String>>> mappers = Map.of(testMaker("jdk.ObjectAllocationInNewTLAB"), ev -> Map.of("start", ""+ ev.getStartTime(), "end", ""+ ev.getEndTime(), "thread", ev.getThread("eventThread").getJavaName(), "class", ev.getClass("objectClass").getName(), "size", ""+ ev.getLong("allocationSize"), "tlab", ""+ ev.getLong("tlabSize") ));

The mappers are a collection of pairs. Each pair consists of a predicate and a function. The predicate is used to test incoming events. If it returns TRUE , the associated function is applied to the event and transforms the event into a map of strings.

This means that the decodeEvent() method loops over the mappers and tries to find a predicate that matches the incoming event. As soon as it does, the corresponding function is called to decode the event.

Each version of Java has a slightly different set of events that JFR supplies. In Java 11, there are more than 120 different possible types. This presents a challenge for supporting all of them because each one will need a slightly different decoder. However, supporting a subset of metrics of particular interest is very feasible.

Java 14 JFR Event Streaming

With Java 14, a new usage mode for JFR becomes available, which is JFR Event Streaming. This API provides a way for programs to receive callbacks when JFR events occur and respond to them immediately.

One obvious and important way that developers might make use of this is via a Java agent. This is a special JAR file that uses the Instrumentation API. In this API, a class declares a special static premain() method to register itself as an instrumentation tool. An example Java agent might look like this:

public class AgentMain implements Runnable { public static void premain(String agentArgs, Instrumentation inst) { try { Logger.getLogger("AgentMain").log( Level.INFO, "Attaching JFR Monitor"); new Thread(new AgentMain()).start(); } catch (Throwable t) { Logger.getLogger("AgentMain").log( Level.SEVERE,"Unable to attach JFR Monitor", t); } } public void run() { var sender = new JfrStreamEventSender(); try (var rs = new RecordingStream()) { rs.enable("jdk.CPULoad") .withPeriod(Duration.ofSeconds(1)); rs.enable("jdk.JavaMonitorEnter") .withThreshold(Duration.ofMillis(10)); rs.onEvent("jdk.CPULoad", sender); rs.onEvent("jdk.JavaMonitorEnter", sender); rs.start(); } } }

The premain() method is called and a new instance of AgentMain is created and run in a new thread. The recording stream is configured and when it is started, it never returns (which is why it’s created on a separate instrumentation thread).

In the recording stream, a large number of events may be generated. Fortunately, the JFR API provides some basic filtering to reduce the number of events that the callbacks are expected to process:

Enabled : Determines whether the event be recorded at all

: Determines whether the event be recorded at all Threshold : Specifies the duration below which an event is not recorded

: Specifies the duration below which an event is not recorded Stack trace : Determines whether the stack trace from the Event.commit() method should be recorded

: Determines whether the stack trace from the method should be recorded Period : Specifies the interval at which the event is emitted, if it is periodic

Note that further filtering capabilities must be implemented by the application programmer. For example, a Java program that is allocating several GB per second (which is a high allocation, but entirely possible for large applications) can easily generate hundreds of thousands of events.

The example Java agent starts a recording stream with each event being sent to an object called a JfrStreamEventSender , which looks like this:

public final class JfrStreamEventSender implements Consumer<RecordedEvent> { private static final String SERVER_URL = "http://127.0.0.1:8080/events"; @Override public void accept(RecordedEvent event) { try { var payload = JFRFileProcessor.decodeEvent(event); String json = new ObjectMapper().writeValueAsString(payload); var client = HttpClient.newHttpClient(); var request = HttpRequest.newBuilder() .uri(URI.create(SERVER_URL)) .timeout(Duration.ofSeconds(30)) .header("Content-Type", "application/json") .POST(HttpRequest.BodyPublishers.ofString(json)) .build(); HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString()); Logger.getLogger("JfrStreamEventSender").log( Level.INFO, "Server response code: " + response.statusCode() + ", body: " + response.body()); } catch (IOException | InterruptedException e) { Logger.getLogger("JfrStreamEventSender").log( Level.SEVERE, "Unable to send JFR event to server", e); } } }

This stateless class simply takes in the event, decodes it in the same way shown in the file processing case and then uses the Jackson library to encode it as JSON. The Java 11 HTTP client is then used to send the metric payload to a service that can process it, which in this example is at port 8080 of localhost .

There is, of course, much more to do to take this simple example code and build a production-grade monitoring tool from it. Building at scale requires creating HTTP endpoints capable of handling a large number of incoming data points. Problems such as uniquely identifying JVMs and ensuring that inbound data is correctly attributed must also be handled.

A data analysis pipeline and medium-term storage system also need to be deployed, as well as front-end visualization and display pages. It is essential that all of these systems are operated at the highest reliability levels, because they are monitoring production systems.

Despite these challenges, JFR Event Streaming represents a significant step forward in the technologies you can use to monitor and observe JVMs running applications. There is keen interest in the new APIs and it seems very likely that shortly after the release of Java 14, support for the streaming events will be added to open source tracing and monitoring libraries. The benefits of the JFR approach to monitoring and profiling are coming to a JVM near you soon.