I’ve recently checked the new features in the latest release (1.7.0_40) of Oracle’s Java SE, when I stumbled upon a new class, sun.misc.IoTrace .

The source of sun.misc.IoTrace can be viewed here. Its basically and “empty” class, which does nothing, and if we want to gather IO trace statistics, we need to redefine it.

I’ve copied the interesting parts from its javadoc here:

Utility class used to identify trace points for I/O calls. To use this class, a diagnostic tool must redefine this class with a version that contains calls to the the diagnostic tool. This implementation will then receive callbacks when file and socket operations are performed. The reason for requiring a redefine of the class is to avoid any overhead caused by the instrumentation. Only blocking I/O operations are identified with this facility. Warning

These methods are called from sensitive points in the I/O subsystem. Great care must be taken to not interfere with ongoing operations or cause deadlocks. In particular: Implementations must not throw exceptions since this will cause disruptions to the I/O operations.

Implementations must not do I/O operations since this will lead to an endless loop.

Since the hooks may be called while holding low-level locks in the I/O subsystem, implementations must be careful with synchronization or interaction with other threads to avoid deadlocks in the VM.

Calls to IOTrace were added to the following classes:

java.io.FileOutputStream

java.io.FileInputStream

java.io.RandomAccessFile

java.net.SocketInputStream

java.net.SocketOutputStream

Here is an example of the modifications (from FileOutputStream.java):

322,329c305 < Object traceContext = IoTrace.fileWriteBegin(path); < int bytesWritten = 0; < try { < writeBytes(b, 0, b.length, append); < bytesWritten = b.length; < } finally { < IoTrace.fileWriteEnd(traceContext, bytesWritten); < } --- > writeBytes(b, 0, b.length, append);

The OpenJDK source comes in handy on how to redefine the IOTrace class.

Based on this I wrote a “naive” agent (source), which collects the IO trace statistics, stores it in StringBuffer-s, and uses a background thread to write it out periodically to the disk. (I know this is not the correct way, so I’ve mounted a tmpfs where the output is written, so it should not interfere with the disks which are traced.)

Unfortunatelly the agent hangs on higher loads (high number of threads) in “interpreted only”(-Xint) and normal mode, although it runs smoothly if it is started in “compiled” mode.

I’m not sure if this is the problem of the agent implementations, because it doesn’t violates the warnings above (except using AtomicReference, which I don’t think can cause this problem). If anyone has an idea what can cause this behaviour, feel free to share it with me.

You need to add these lines to MANIFEST.MF when creating the agent jar:

Premain-Class: IoTraceAgent Can-Redefine-Classes: true Boot-Class-Path: iotraceagent.jar

You need to add this option to the java commandline, if you want to load the agent:

-javaagent:<path>/iotraceagent.jar

It will generate similar output like this (start of a jetty server):

2614 T1 R 11321 1312 /opt/local/jdk1.7.0_40/jre/lib/zi/Europe/Budapest 4396 T1 R 48 2090 /opt/local/jdk1.7.0_40/jre/lib/meta-index 4457 T1 R 24 0 /opt/local/jdk1.7.0_40/jre/lib/meta-index 4954 T1 R 6189 4200 /opt/local/jdk1.7.0_40/jre/lib/currency.data 6165 T1 R 135 2277 <removed>/jetty-distribution-8.1.5.v20120716/start.ini 6165 T1 R 74 0 <removed>/jetty-distribution-8.1.5.v20120716/start.ini 7601 T1 R 15 8192 /opt/local/jdk1.7.0_40/jre/lib/security/java.security 7621 T1 R 32 8192 /opt/local/jdk1.7.0_40/jre/lib/security/java.security 7622 T1 R 11 1223 /opt/local/jdk1.7.0_40/jre/lib/security/java.security 7622 T1 R 7 0 /opt/local/jdk1.7.0_40/jre/lib/security/java.security 14487 T1 R 227 3637 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty.xml 17395 T1 R 138 0 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty.xml 19060 T1 R 83 1078 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-annotations.xml 19230 T1 R 90 0 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-annotations.xml 19295 T1 R 64 2049 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-deploy.xml 19297 T1 R 83 0 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-deploy.xml 19359 T1 R 163 1451 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-webapps.xml 19362 T1 R 65 0 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-webapps.xml 19412 T1 R 145 1190 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-contexts.xml 19415 T1 R 56 0 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-contexts.xml 19436 T1 R 50 1044 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-testrealm.xml 19439 T1 R 55 0 <removed>/jetty-distribution-8.1.5.v20120716/etc/jetty-testrealm.xml 21246 T1 R 34 8192 <removed>/jetty-distribution-8.1.5.v20120716/webapps/spdy.war

And similar for the socket-io trace (mysql communication):

46296 T37 R 1420 90 0 /192.168.0.1:3306 46311 T1 R 4629 90 0 /192.168.0.1:3306 46523 T1 R 32 11 0 /192.168.0.1:3306 46508 T37 R 19175 4 0 /192.168.0.1:3306 46527 T37 R 4 7 0 /192.168.0.1:3306 46669 T1 R 12177 4 0 /192.168.0.1:3306 46682 T1 R 17 709 0 /192.168.0.1:3306 46678 T37 R 14013 4 0 /192.168.0.1:3306 46693 T37 R 4 709 0 /192.168.0.1:3306 47170 T1 R 9564 4 0 /192.168.0.1:3306 47180 T1 R 4 85 0 /192.168.0.1:3306 47202 T37 R 9761 4 0 /192.168.0.1:3306 47212 T37 R 4 85 0 /192.168.0.1:3306 47231 T37 R 11117 4 0 /192.168.0.1:3306 47242 T37 R 13 1352 0 /192.168.0.1:3306 47243 T37 R 2 1356 0 /192.168.0.1:3306 47243 T37 R 835 1 0 /192.168.0.1:3306 47244 T37 R 10 1355 0 /192.168.0.1:3306 47244 T37 R 1 1356 0 /192.168.0.1:3306 47244 T37 R 5 1356 0 /192.168.0.1:3306 47245 T37 R 1 1086 0 /192.168.0.1:3306 47239 T1 R 10512 4 0 /192.168.0.1:3306 47249 T1 R 5 1352 0 /192.168.0.1:3306 47249 T1 R 175 1 0 /192.168.0.1:3306 47249 T1 R 2 1355 0 /192.168.0.1:3306

The columns are:

time elapsed from agent start (millisec)

thread id

type of operation

duration (microsec)

bytes read or written

timeout (only for sockets)

resource (filename, socket address)

Some final notes:

I’m only tested the agent with some “demo” level applications

The agent needs to be enhanced to handle high IO loads

Still need to solve the hanging problem of JVM

I’m planning to create a tool to post-process and visualize the data

If you could use an iotracer tool for your java applications check back later

If you are interested in collaborating to create a high-quality iotracer tool for java, feel free to contact me

Update: (09.25.)

I’ve added some preliminary visualisation because raw data is boring:

Web application in jetty, events displayed by file-groups, minutes scale:



Web application in jetty, events displayed by file-groups, seconds scale:



Web application in jetty, events displayed by threads:

