To blog Previous post | Next post

Classloading and locking

This post got inspiration from Pierre-Hugues Charbonneau’s article on how loading non-existing classes can severely hurt your performance. This brought back memories from a debugging session where the same problem surfaced a bit differently.

The application at hand was Yet Another Webapp ™, carrying out whatever mundane tasks were assigned to it. Until it no longer did not. Symptoms included end users complaining about slow application and outright timeouts. Log files I got access revealed nothing special besides confirming the fact that the user experience was severely degraded.

The bug fell into the particularly nasty category – being among the bugs which you are sure you can reproduce, but not at will. The situation seemed to arise somewhat randomly about once or twice a week. So I asked Operations guys responsible for this application to run a diagnostic script whenever this situation arises again. That script was nothing very fancy – mainly collecting several thread dumps along with some extra OS level metrics. And prayed to the mighty Debug god for a help.

This time the praying seemed to have worked. In a couple of days I was staring at a thread dump filled with traces similar to the following:

"Thread-96" #106 prio=5 os_prio=31 tid=0x00007fc9b283b800 nid=0x11b03 waiting for monitor entry [0x0000000130133000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.ClassLoader.loadClass(ClassLoader.java:404) - waiting to lock (a java.lang.Object) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at com.sun.org.apache.xerces.internal.utils.ObjectFactory.findProviderClass(ObjectFactory.java:209) at com.sun.org.apache.xerces.internal.utils.ObjectFactory.newInstance(ObjectFactory.java:157) at com.sun.org.apache.xerces.internal.utils.ObjectFactory.newInstance(ObjectFactory.java:143) at com.sun.org.apache.xerces.internal.impl.dv.DTDDVFactory.getInstance(DTDDVFactory.java:64) at com.sun.org.apache.xerces.internal.impl.dv.DTDDVFactory.getInstance(DTDDVFactory.java:49) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.(XML11Configuration.java:578) at com.sun.org.apache.xerces.internal.parsers.XIncludeAwareParserConfiguration.(XIncludeAwareParserConfiguration.java:130) at com.sun.org.apache.xerces.internal.parsers.XIncludeAwareParserConfiguration.(XIncludeAwareParserConfiguration.java:91) at com.sun.org.apache.xerces.internal.parsers.DOMParser.(DOMParser.java:144) at com.sun.org.apache.xerces.internal.parsers.DOMParser.(DOMParser.java:128) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.(DocumentBuilderImpl.java:138) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(DocumentBuilderFactoryImpl.java:76) at ClassloadingLock$Worker.run(ClassloadingLock.java:10)

Vast majority of the application threads were locked and waiting behind a single monitor to be released. What made the case interesting was that the monitor was synchronizing access to a classloading. This was something one does not see very often, especially on an application which was initialized and warmed up already days prior to this lock escalation event.

Did you know that 16% of Java applications face degraded user experience due to lock contention? Don’t blame the locks – detect them with Plumbr instead.

Digging into the code revealed the problem. Around once a week an cron job launched bunch of housekeeping threads. Those threads were supposed to extract and transform the data to be loaded to the data warehouse. The transformation rules were described in XML loaded by those threads using the javax.xml.parsers.DocumentBuilder. For whatever reason, each thread loaded the XML document itself each time when a transformation was due.

As seen from the thread dump, doing so results in loading XML provider class each time the transformation was about to be carried out:

at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at com.sun.org.apache.xerces.internal.utils.ObjectFactory.findProviderClass(ObjectFactory.java:209)

And now, if we check to the source code of the java.lang.ClassLoader.loadClass() itself, we can indeed see the synchronization before the check whether the class is already loaded or should we look to the filesystem in order to load it from the class file. Now, when several threads are accessing this synchronized block at the same time, we have managed to build the baseline for lock escalation:

protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException { synchronized (getClassLoadingLock(name)) { // First, check if the class has already been loaded Class c = findLoadedClass(name); // ... cut for brevity ... }

When trying to recreate the problem, I built the following small test case for you to better understand the locking issue at hand. The example is launching 100 threads which do nothing but load the javax.xml.parsers.DocumentBuilder class:

package eu.plumbr.demo; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; public class ClassloadingLock { static class Worker extends Thread { @Override public void run() { while (true) { try { DocumentBuilder b = DocumentBuilderFactory.newInstance() .newDocumentBuilder(); } catch (Exception e) {// Do not do this at home, log all // exceptions } } } } public static void main(String[] args) throws Exception { for (int i = 0; i < 100; i++) { new Worker().start(); } } }

When you do a thread dump while the code above is running or, visualize the situation with the jvisualvm, you will indeed see the threads being constantly waiting behind the monitor.

Moral of the story? We do not think finding the causes for performance issues should be that hard. The tools we create will hopefully make the skills I needed to troubleshoot this case irrelevant, as you will have a lot more convenient way to monitor and troubleshoot your JVMs.