Let's look at one amazing tool for gathering information about what's going on in your JVM — the Java Flight Recorder.

Let's look at one amazing tool for gathering information about what's going on in your JVM. The tool is called theand in today's article, I'm going to use it to dig into howbehave in your Java application.

A lot of people use HTTP APIs to communicate between their services (let's have another article about whether it's a good solution or not), and in a majority of cases, it works fine. However, we can get into situations where performance really matters, and especially for libraries writers, JFR can be a very useful tool to see I/O works in their code.

You may also like: Using Java Flight Recorder With OpenJDK 11

This article is not an introduction to Flight Recorder, it's rather more practical. If you want to learn more about this technology, you can start with JEP 328: Flight Recorder.

Introduction to Our Netty Application

As an example, I decided to use a great project called Netty that simplifies the JDK NIO implementation and provides a lot of tricks we can use to optimize our application. Let's dive into what we are going to use for our experiments.

Feel free to check out my jfr-sockets project on GitHub and try out everything with me. The project actually contains two applications based on TCP. The first one in the package simple is just a dump TCP Client/Server and the client pushes every 1s a string my-message that takes exactly 10 bytes. And the second one is Netty-based Client/Server application where the server pushes every 1s the same string as before but using WebSocket. I deliberately chose those types of applications, and we'll look at the details of why I did it.

You can also notice a String constant called CONFIG. I'm using a trick where the file with the proper configuration is created this way, instead of changing a regular JFR configuration file. If you want to see configuration files that are in JDK, please, open this folder ${JAVA_HOME}/lib/jfr . Very likely you spotted two files default.jfc and profile.jfc. If you open it, you can see XML configuration of JFR events. This configuration says when the event is supposed to be emitted. The configuration file default has smaller overhead (~ 1%), and it's useful for continuous profiling whereas the profile configuration is much aggressive and emits more events with lower thresholds (~ 2% overhead).

We needed to create our own settings because, by default, there is a 20 ms threshold for reading and writing but we want to see every operation.

Notice that I'm using a new feature that's available since Java 14 JEP 349: JFR Event Streaming, which means if you want to try it out, you need to download it from https://jdk.java.net/14/.

Run a Simple Blocking Application

The very first example is a class pbouda.jfr.sockets.simple.Start. If you try it out, you can see the output similar to one below. As we mentioned earlier, the application writes UTF String my-message (10 bytes), and if you look inside the methods, we actually got in the stacktrace field, 10-byte string is encoded into 12-byte message.

Since TCP is based on streams, we need a well-known delimiter between two messages, or we need to encode the size of the message somehow. JDK implementation of writeUTF and readUTF is based on the second option. JDK uses the first two bytes as a short type, which determines the number of bytes in the message itself.

Java







x



1 [ 0 , 10 , 109 , 121 , 45 , 109 , 101 , 115 , 115 , 97 , 103 , 101 ]









According to jdk.SocketWrite event, we can see that we wrote 12-byte array (2 bytes for length, 10 bytes for a payload).

In terms of jdk.SocketRead, we can notice two events belonging to DataInputStream#readUnsignedShort . As we mentioned above, first, we need to read the size of the payload (unsigned short). This operation emitted two events because it always calls SocketInputStream#read twice to read short value (one byte per operation). The very last event already contains the payload and calls DataInputStream#readFully .

Java







xxxxxxxxxx 1 79



1 jdk . SocketWrite { 2 startTime = 20 : 03 : 57.651 3 duration = 0.185 ms 4 host = "localhost" 5 address = "127.0.0.1" 6 port = 52960 7 bytesWritten = 12 bytes 8 eventThread = "Thread-0" ( javaThreadId = 18 ) 9 stackTrace = [ 10 java . net . Socket$SocketOutputStream . write ( byte [], int , int ) line : 65 11 java . io . DataOutputStream . write ( byte [], int , int ) line : 106 12 java . io . DataOutputStream . writeUTF ( String , DataOutput ) line : 397 13 java . io . DataOutputStream . writeUTF ( String ) line : 325 14 pbouda . jfr . sockets . simple . Start$ClientHandler . run () line : 67 15 ... 16 ] 17 } 18 19 jdk . SocketRead { 20 startTime = 20 : 03 : 56.558 21 duration = 1.09 s 22 host = "localhost" 23 address = "127.0.0.1" 24 port = 5000 25 timeout = 0 s 26 bytesRead = 1 byte 27 endOfStream = false 28 eventThread = "main" ( javaThreadId = 1 ) 29 stackTrace = [ 30 java . net . Socket$SocketInputStream . read ( byte [], int , int ) line : 68 31 java . net . Socket$SocketInputStream . read () line : 977 32 java . io . DataInputStream . readUnsignedShort () line : 341 33 java . io . DataInputStream . readUTF ( DataInput ) line : 593 34 java . io . DataInputStream . readUTF () line : 568 35 ... 36 ] 37 } 38 39 jdk . SocketRead { 40 startTime = 20 : 03 : 57.651 41 duration = 0.0617 ms 42 host = "localhost" 43 address = "127.0.0.1" 44 port = 5000 45 timeout = 0 s 46 bytesRead = 1 byte 47 endOfStream = false 48 eventThread = "main" ( javaThreadId = 1 ) 49 stackTrace = [ 50 java . net . Socket$SocketInputStream . read ( byte [], int , int ) line : 68 51 java . net . Socket$SocketInputStream . read () line : 977 52 java . io . DataInputStream . readUnsignedShort () line : 342 53 java . io . DataInputStream . readUTF ( DataInput ) line : 593 54 java . io . DataInputStream . readUTF () line : 568 55 ... 56 ] 57 } 58 59 jdk . SocketRead { 60 startTime = 20 : 03 : 57.652 61 duration = 0.0699 ms 62 host = "localhost" 63 address = "127.0.0.1" 64 port = 5000 65 timeout = 0 s 66 bytesRead = 10 bytes 67 endOfStream = false 68 eventThread = "main" ( javaThreadId = 1 ) 69 stackTrace = [ 70 java . net . Socket$SocketInputStream . read ( byte [], int , int ) line : 68 71 java . io . DataInputStream . readFully ( byte [], int , int ) line : 199 72 java . io . DataInputStream . readUTF ( DataInput ) line : 613 73 java . io . DataInputStream . readUTF () line : 568 74 pbouda . jfr . sockets . simple . Start . main ( String []) line : 49 75 ... 76 ] 77 } 78 79 my - message









If you want to see how those events are represented in JDK, then here is a clue:

Java







xxxxxxxxxx 1



1 jdk . jfr . events . SocketReadEvent 2 jdk . jfr . events . SocketWriteEvent









Don't hesitate to navigate to where those events are emitted. JFR events are usually cached and reused later on. In this case, we can see that the object is stored in ThreadLocal variable. The fields are always re-set, the event is emitted, and then the object can be used again by the same thread. Everything is done for the sake of being GC friendly.

What Interesting Stuff Can We See?

The socket events are very rich and we can see a lot of interesting information. The majority of those fields are very descriptive, just a note about the address/hostname/port. We can very easily see what other applications we are communicating to, don't we call any suspicious endpoint?

However, my favorite field is duration. We can see big differences between SocketRead events in the snippet above. Why the very first events took more than 1 second and what does it actually mean?

We already know that the client has a 1-second delay between messages. However, the duration field is incredibly valuable because now we can see that something is blocking our thread (of course, we can see it even from the stacktrace because InputStream is blocking by nature). What actually the thread does is that it waits until any data arrives to Receive Socket Buffer to be able to copy it into internal temporary DirectBuffer . We can notice that the other operations were much faster because bytes were already in the socket buffer and we just read the exact number of bytes we needed.

Why Do I See Such a Long Duration?

If you use Java Mission Control to visualize your JFR events, then you very likely noticed that Socket I/O section contains incredible long durations and Total I/O Time. This is the result of having some blocking TCP connections opened and waiting for data. In the case below, it's RMI TCP Connection, but you can come across a very similar problem if you have let's say RabbitMQ and don't consume any messages.

Where Are the Events Emitted From My Endpoint?

You can be also surprised: Why don't I see any other traffic? Let's say we have an application with HTTP Endpoint, which is used very often. The reason is that you probably run on the default JFR configuration that says: only operations longer than 20ms will emit an event. That means that you can celebrate because your application very likely doesn't suffer any serious (I know it's relative, 20ms is not a small latency) Socket IO problems.

Run Netty-Based WebSocket Application

Let's proceed with the second application. As we already mentioned, it's a WebSocket application where the server broadcasts the messages to all connected clients. For this experiment, we're going to connect only one client and see what messages the client receives.

If you know Netty project, you probably noticed that Netty contains several implementations of SocketChannel . Let's start with the JDK-based NIO Socket Channel.

Run a class that automatically starts the client and server: pbouda.jfr.sockets.netty.Start.

First, we can see four events belonging to establishing a WebSocket connection. Notice that we already have a meaningful eventThread field that can help us a lot when we start to investigate any problems.

Java







xxxxxxxxxx 1 74



1 jdk . SocketWrite { 2 startTime = 21 : 21 : 27.135 3 duration = 0.677 ms 4 host = "" 5 address = "127.0.0.1" 6 port = 8080 7 bytesWritten = 292 bytes 8 eventThread = "client-nioEventLoopGroup-0" ( javaThreadId = 26 ) 9 stackTrace = [ sun . nio . ch . SocketChannelImpl . write ( ByteBuffer ) line : 138 10 io . netty . channel . socket . nio . NioSocketChannel . doWrite ( ChannelOutboundBuffer ) line : 405 11 io . netty . channel . AbstractChannel$AbstractUnsafe . flush0 () line : 931 12 io . netty . channel . nio . AbstractNioChannel$AbstractNioUnsafe . flush0 () line : 354 13 io . netty . channel . AbstractChannel$AbstractUnsafe . flush () line : 898 14 ... 15 ] 16 } 17 18 jdk . SocketRead { 19 startTime = 21 : 21 : 27.149 20 duration = 0.0479 ms 21 host = "" 22 address = "127.0.0.1" 23 port = 41244 24 timeout = 0 s 25 bytesRead = 292 bytes 26 endOfStream = false 27 eventThread = "server-nioEventLoopGroup-1" ( javaThreadId = 27 ) 28 stackTrace = [ 29 sun . nio . ch . SocketChannelImpl . read ( ByteBuffer ) line : 73 30 io . netty . buffer . PooledByteBuf . setBytes ( int , ScatteringByteChannel , int ) line : 247 31 io . netty . buffer . AbstractByteBuf . writeBytes ( ScatteringByteChannel , int ) line : 1147 32 io . netty . channel . socket . nio . NioSocketChannel . doReadBytes ( ByteBuf ) line : 347 33 io . netty . channel . nio . AbstractNioByteChannel$NioByteUnsafe . read () line : 148 34 ... 35 ] 36 } 37 38 jdk . SocketWrite { 39 startTime = 21 : 21 : 27.177 40 duration = 0.102 ms 41 host = "" 42 address = "127.0.0.1" 43 port = 41244 44 bytesWritten = 129 bytes 45 eventThread = "server-nioEventLoopGroup-1" ( javaThreadId = 27 ) 46 stackTrace = [ 47 sun . nio . ch . SocketChannelImpl . write ( ByteBuffer ) line : 138 48 io . netty . channel . socket . nio . NioSocketChannel . doWrite ( ChannelOutboundBuffer ) line : 405 49 io . netty . channel . AbstractChannel$AbstractUnsafe . flush0 () line : 931 50 io . netty . channel . nio . AbstractNioChannel$AbstractNioUnsafe . flush0 () line : 354 51 io . netty . channel . AbstractChannel$AbstractUnsafe . flush () line : 898 52 ... 53 ] 54 } 55 56 jdk . SocketRead { 57 startTime = 21 : 21 : 27.183 58 duration = 0.0394 ms 59 host = "" 60 address = "127.0.0.1" 61 port = 8080 62 timeout = 0 s 63 bytesRead = 129 bytes 64 endOfStream = false 65 eventThread = "client-nioEventLoopGroup-0" ( javaThreadId = 26 ) 66 stackTrace = [ 67 sun . nio . ch . SocketChannelImpl . read ( ByteBuffer ) line : 73 68 io . netty . buffer . PooledByteBuf . setBytes ( int , ScatteringByteChannel , int ) line : 247 69 io . netty . buffer . AbstractByteBuf . writeBytes ( ScatteringByteChannel , int ) line : 1147 70 io . netty . channel . socket . nio . NioSocketChannel . doReadBytes ( ByteBuf ) line : 347 71 io . netty . channel . nio . AbstractNioByteChannel$NioByteUnsafe . read () line : 148 72 ... 73 ] 74 }









Let's see the events related to our payload (it's not so easy to figure out what events belong to what log record but it does not matter so much right now).

We can notice that we have very often one event for reading and one for writing both containing 12 bytes (it can be also different, we can read just a part of the message and then WebSocket support in Netty needs to wait for another read/buffer to be able to parse the whole WebSocket frame).

Someone can be curious why did we transport 12 bytes in a case of the 10-byte message? There is an overhead of 2 bytes in case of messages smaller than 126 bytes, it's related to WebSocket framing mechanism.

Someone can also be surprised that we didn't experience any durations longer than 1-sec in the case of SocketRead events as we did in the first example. The reason is that NIO is configured in a non-blocking mode, which means that it does not wait on incoming data (just copies what is currently in socket buffer into JDK DirectBuffer and pass it to Netty and don't care about the number of bytes), and the current thread just tries to do something else instead (it can read from a different channel if we have multiple connections for one event-loop). So, we can see that our application is very likely healthy in terms of blocking threads; it can be a problem if you want to pass a C10K problem (handling thousands of connections in parallel).

Java







xxxxxxxxxx 1 39



1 Broadcaster - Server 2020 - 01 - 14 21 : 21 : 29 , 181 [ client - nioEventLoopGroup - 0 ] INFO p . j . s . n . c . WebSocketClientHandler - Received message : my - message ( 10 bytes ) 2 3 jdk . SocketWrite { 4 startTime = 21 : 21 : 29.454 5 duration = 0.0644 ms 6 host = "" 7 address = "127.0.0.1" 8 port = 41244 9 bytesWritten = 12 bytes 10 eventThread = "server-nioEventLoopGroup-1" ( javaThreadId = 27 ) 11 stackTrace = [ 12 sun . nio . ch . SocketChannelImpl . write ( ByteBuffer ) line : 138 13 io . netty . channel . socket . nio . NioSocketChannel . doWrite ( ChannelOutboundBuffer ) line : 405 14 io . netty . channel . AbstractChannel$AbstractUnsafe . flush0 () line : 931 15 io . netty . channel . nio . AbstractNioChannel$AbstractNioUnsafe . flush0 () line : 354 16 io . netty . channel . AbstractChannel$AbstractUnsafe . flush () line : 898 17 ... 18 ] 19 } 20 21 jdk . SocketRead { 22 startTime = 21 : 21 : 29.454 23 duration = 0.0197 ms 24 host = "" 25 address = "127.0.0.1" 26 port = 8080 27 timeout = 0 s 28 bytesRead = 12 bytes 29 endOfStream = false 30 eventThread = "client-nioEventLoopGroup-0" ( javaThreadId = 26 ) 31 stackTrace = [ 32 sun . nio . ch . SocketChannelImpl . read ( ByteBuffer ) line : 73 33 io . netty . buffer . PooledByteBuf . setBytes ( int , ScatteringByteChannel , int ) line : 247 34 io . netty . buffer . AbstractByteBuf . writeBytes ( ScatteringByteChannel , int ) line : 1147 35 io . netty . channel . socket . nio . NioSocketChannel . doReadBytes ( ByteBuf ) line : 347 36 io . netty . channel . nio . AbstractNioByteChannel$NioByteUnsafe . read () line : 148 37 ... 38 ] 39 }









Show Me Some Possible Optimizations With Netty!

Netty allows us to drive our communication belonging to one connection very effectively. I prepared one example showing that we can very easily decrease the number of syscalls using a cache of messages in Netty. Simply, we decided to sacrifice a latency of one message and make our server application much more efficient by removing syscalls. Let's say we want to flush not every message but every fifth message to the given client.

We need to do two tiny changes:

Java







xxxxxxxxxx 1



1 pbouda . jfr . sockets . netty . server . SlowConsumerDisconnectHandler 2 - we need to comment out flushing of every message and use simple write instead 3 - write method does not automatically write data into the socket , it waits for a flush context . writeAndFlush ( obj ) -> context . write ( obj )









Java







xxxxxxxxxx 1



1 pbouda . jfr . sockets . netty . Start#main 2 - uncomment the section at the end of the method `Flush a bulk of 5 messages`









Now, we can start the Netty app again. We can see that we received five messages in a bulk and transmit 60 bytes instead of 12 using one syscall/invocation.

Java







xxxxxxxxxx 1 43



1 Broadcaster - Server 2020 - 01 - 14 22 : 12 : 00 , 937 [ client - nioEventLoopGroup - 0 ] INFO p . j . s . n . c . WebSocketClientHandler - Received message : my - message ( 10 bytes ) 2 Broadcaster - Server 2020 - 01 - 14 22 : 12 : 00 , 937 [ client - nioEventLoopGroup - 0 ] INFO p . j . s . n . c . WebSocketClientHandler - Received message : my - message ( 10 bytes ) 3 Broadcaster - Server 2020 - 01 - 14 22 : 12 : 00 , 938 [ client - nioEventLoopGroup - 0 ] INFO p . j . s . n . c . WebSocketClientHandler - Received message : my - message ( 10 bytes ) 4 Broadcaster - Server 2020 - 01 - 14 22 : 12 : 00 , 938 [ client - nioEventLoopGroup - 0 ] INFO p . j . s . n . c . WebSocketClientHandler - Received message : my - message ( 10 bytes ) 5 Broadcaster - Server 2020 - 01 - 14 22 : 12 : 00 , 939 [ client - nioEventLoopGroup - 0 ] INFO p . j . s . n . c . WebSocketClientHandler - Received message : my - message ( 10 bytes ) 6 7 jdk . SocketWrite { 8 startTime = 22 : 12 : 01.603 9 duration = 2.23 ms 10 host = "" 11 address = "127.0.0.1" 12 port = 42556 13 bytesWritten = 60 bytes 14 eventThread = "server-nioEventLoopGroup-1" ( javaThreadId = 27 ) 15 stackTrace = [ 16 sun . nio . ch . SocketChannelImpl . write ( ByteBuffer [], int , int ) line : 167 17 io . netty . channel . socket . nio . NioSocketChannel . doWrite ( ChannelOutboundBuffer ) line : 420 18 io . netty . channel . AbstractChannel$AbstractUnsafe . flush0 () line : 931 19 io . netty . channel . nio . AbstractNioChannel$AbstractNioUnsafe . flush0 () line : 354 20 io . netty . channel . AbstractChannel$AbstractUnsafe . flush () line : 898 21 ... 22 ] 23 } 24 25 jdk . SocketRead { 26 startTime = 22 : 12 : 01.605 27 duration = 0.0757 ms 28 host = "" 29 address = "127.0.0.1" 30 port = 8080 31 timeout = 0 s 32 bytesRead = 60 bytes 33 endOfStream = false 34 eventThread = "client-nioEventLoopGroup-0" ( javaThreadId = 26 ) 35 stackTrace = [ 36 sun . nio . ch . SocketChannelImpl . read ( ByteBuffer ) line : 73 37 io . netty . buffer . PooledByteBuf . setBytes ( int , ScatteringByteChannel , int ) line : 247 38 io . netty . buffer . AbstractByteBuf . writeBytes ( ScatteringByteChannel , int ) line : 1147 39 io . netty . channel . socket . nio . NioSocketChannel . doReadBytes ( ByteBuf ) line : 347 40 io . netty . channel . nio . AbstractNioByteChannel$NioByteUnsafe . read () line : 148 41 ... 42 ] 43 }









I Use Netty and Don't See Any Events! Why?

There are plenty of reasons, but one of them is a bit tricky. If you are running on Linux and you want to be very efficient, you can enable e-poll transport. It's a non-blocking native transport that can be used only on Linux. The big advantage of this transport is that if we have an application handling tens of thousands of connections in parallel, then, in the case of JDK NIO, we have to iterate over all connections to try if there is any data to read — we have O(N) to read data. Simple, e-poll is able to get the information about readable connections - O(1).

The problem is that the e-poll support is fully developed by Netty team and does not emit any JFR events. If you want to try it out, then:

Java







xxxxxxxxxx 1



1 pbouda . jfr . sockets . netty . server . Server 2 - uncomment EpollEventLoopGroup instead of NioEventLoopGroup 3 - uncomment EpollServerSocketChannel instead of NioServerSocketChannel









Summary

JFR events can be a very powerful tool to find weaknesses in our applications. It's worth studying what it offers to us and be ready to use it when we run into any problem. In general, it means:

SocketRead events: We are waiting for incoming data. Our Receive Socket Buffer is very likely empty. In the case of non-blocking transport, we don't receive any events because of the non-blocking nature of the implementation. In the case of blocking transport, we can receive events but it can be some kind of connection that is just idle and waits for any data.

SocketWrite events: The situation is just reversed. We can block in case of a blocking transport if the Send Socket Buffer is full. In the case of non-blocking transport, we can do partial write (write just bytes that fit to Send Socket Buffer) and the rest of bytes need to be cached by our application.

If you want to try more events (even other types), then check out this repo jfr-playground. Otherwise, thank you for reading my article and please leave comments below. If you would like to be notified about new posts, then start following me on Twitter!

Further Reading

Using Java Flight Recorder With OpenJDK 11

Java 11 Features: Java Flight Recorder