Caching is hard in various ways. Whenever you’re caching things, you have to at least think of:

Memory consumption

Invalidation

In this article, I want to show a flaw that often sneaks into custom cache implementations, making them inefficient for some execution paths. I’ve encountered this flaw in Eclipse, recently.

What did Eclipse do wrong?

I periodically profile Eclipse using Java Mission Control (JMC) when I discover a performance issue in the compiler (and I’ve discovered a few).

Just recently, I’ve found a new regression that must have been introduced with the new Java 9 module support in Eclipse 4.7.1a:

Using Eclipse 4.7.1a (with module support)? Vote for a timely fix of this significant (and easy to fix) performance regression: https://t.co/cyw2xvzy5q — Lukas Eder (@lukaseder) December 13, 2017

Luckily, the issue has already been fixed for 4.7.2 (https://bugs.eclipse.org/bugs/show_bug.cgi?id=526209). What happened?

In that profiling session, I’ve found an awful lot of accesses to java.util.zip.ZipFile whenever I used the “content assist” feature (auto completion). This was the top stack trace in the profiler:

int java.util.zip.ZipFile$Source.hashN(byte[], int, int) void java.util.zip.ZipFile$Source.initCEN(int) void java.util.zip.ZipFile$Source.(ZipFile$Source$Key, boolean) ZipFile$Source java.util.zip.ZipFile$Source.get(File, boolean) void java.util.zip.ZipFile.(File, int, Charset) void java.util.zip.ZipFile.(File, int) void java.util.zip.ZipFile.(File) ZipFile org.eclipse.jdt.internal.core.JavaModelManager.getZipFile(IPath, boolean) ZipFile org.eclipse.jdt.internal.core.JavaModelManager.getZipFile(IPath) ZipFile org.eclipse.jdt.internal.core.JarPackageFragmentRoot.getJar() byte[] org.eclipse.jdt.internal.core.AbstractClassFile.getClassFileContent(JarPackageFragmentRoot, String) IBinaryModule org.eclipse.jdt.internal.core.ModularClassFile.getJarBinaryModuleInfo() IBinaryModule org.eclipse.jdt.internal.core.ModularClassFile.getBinaryModuleInfo() boolean org.eclipse.jdt.internal.core.ModularClassFile.buildStructure(...) void org.eclipse.jdt.internal.core.Openable.generateInfos(Object, HashMap, IProgressMonitor) Object org.eclipse.jdt.internal.core.JavaElement.openWhenClosed(Object, boolean, IProgressMonitor) Object org.eclipse.jdt.internal.core.JavaElement.getElementInfo(IProgressMonitor) Object org.eclipse.jdt.internal.core.JavaElement.getElementInfo() boolean org.eclipse.jdt.internal.core.JavaElement.exists() boolean org.eclipse.jdt.internal.core.Openable.exists() IModuleDescription org.eclipse.jdt.internal.core.PackageFragmentRoot.getModuleDescription() IModuleDescription org.eclipse.jdt.internal.core.NameLookup.getModuleDescription(IPackageFragmentRoot, Map, Function) ...

In fact, the profiling session doesn’t show the exact number of accesses, but the number of stack trace samples that contained the specific method(s) which corresponds to the time spent inside of a method, not the number of calls (which is less relevant). Clearly, accessing zip files shouldn’t be the thing that Eclipse should be doing most of the time, when auto completing my code. So, why did it do it anyway?

It turns out, the problem was in the method getModuleDescription(), which can be summarised as follows:

static IModuleDescription getModuleDescription( IPackageFragmentRoot root, Map<IPackageFragmentRoot,IModuleDescription> cache, Function<IPackageFragmentRoot,IClasspathEntry> rootToEntry ) { IModuleDescription module = cache.get(root); if (module != null) return module; ... // Expensive call to open a Zip File in these calls: if (root.getKind() == IPackageFragmentRoot.K_SOURCE) module = root.getJavaProject().getModuleDescription(); else module = root.getModuleDescription(); if (module == null) { ... } if (module != null) cache.put(root, module); return module; }

The ZipFile access is hidden inside the getModuleDescription() call. A debugger revealed that the JDK’s rt.jar file was opened quite a few times to look for a module-info.class file. Can you spot the mistake in the code?

The method gets an external cache that may already contain the method’s result. But the method may also return null in case there is no module description. Which there isn’t. jOOQ has not yet been modularised, and most libraries on which jOOQ depends haven’t been modularised either, nor has the JDK been modularised using which jOOQ is currently built (JDK 8). So, this method always returns null for non-modular stuff.

But if it returns null, it won’t put anything in the cache:

if (module != null) cache.put(root, module); return module; }

… which means the next time it is called, there’s a cache miss:

IModuleDescription module = cache.get(root); if (module != null) return module;

… and the expensive logic involving the ZipFile call is invoked again. In other words, it is invoked all the time (for us).

Caching optional values

This is an important thing to always remember, and it is not easy to remember. Why? Because the developer who implemented this cache implemented it for the “happy path” (from the perspective of someone working with modules). They probably tried their code with a modular project, in case of which the cache worked perfectly. But they didn’t check if the code still works for everyone else. And in fact, it does work. The logic isn’t wrong. It’s just not optimal.

The solution to these things is simple. If the value null encodes a cache miss, we need another “ PSEUDO_NULL ” to encode the actual null value, or in this case something like NO_MODULE . So, the method can be rewritten as:

static IModuleDescription getModuleDescription( IPackageFragmentRoot root, Map<IPackageFragmentRoot,IModuleDescription> cache, Function<IPackageFragmentRoot,IClasspathEntry> rootToEntry ) { IModuleDescription module = cache.get(root); // Decode encoded NO_MODULE value: if (module == NO_MODULE) return null; if (module != null) return module; module = ... if (module != null) cache.put(root, module); // Encode null value: else cache.put(root, NO_MODULE); return module; }

… where this NO_MODULE can be a simple java.lang.Object if you don’t care about generics, or a dummy IModuleDescription in our case:

static final IModuleDescription NO_MODULE = new IModuleDescription() { ... };

Since it will be a singleton instance, we can use identity comparisons in our method.

Conclusion

When caching method results, always check if null is a valid result for the method. If it is, and if your cache is a simple Map , then you have to encode the null value with some sort of NO_MODULE value for the cache to work properly. Otherwise, you won’t be able to distinguish Map.get(key) == null for the cases:

Cache miss and Map returns null

Cache hit and the value is null

Update after some useful reddit / DZone comments

As /u/RayFowler pointed out on this article’s reddit discussion, the concept illustrated here is called “negative caching”

Something that is often forgotten when performing negative caching is the fact that exceptions are also a result, as pointed out by /u/zombifai in the same reddit discussion. The fix in Eclipse correctly took this into account as can be seen here: https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=addfd789e17dbb99af0304912ef45e4ae72c0605