Simple as the title says, I dare say Hibernate second level (L2) cache is fundamentally broken. At least with clustered cache, which appears to be supported from the official docs.

There are two strong reasons for clustering, that is spreading your load over multiple servers: Scaling out and availability.

What does availability mean? Among other things, zero downtime on updates. You take one server down, update, start it up and then continue with the next. With reasonable load balancer, proxy, middleware or what have you, the clients won’t notice a thing and be seamlessly redirected to whichever server is up at the moment.

Now, what if during an update you change definition of an entity? Add or remove a field? You have old servers with old definition, updated servers with new definition, all sharing the same clustered cache. Java has serialVersionUID for it. If you serialize an object, and then deserialize it on another node with a different version, it will fail with an exception.

Since Hibernate openly advertises clustered caching, one would expect it to work just fine with this case. Unfortunately, this is not so.

What Hibernate puts in cache is a plain old array of values of individual fields. That is submitted to clustered cache. Then a node with different version loads this array from cache and tries to populate entity with different definition with it, simply copying field by field by their numeric index.

When that happens, you’re screwed.

Suppose you have this entity definition:

class User { int id; String password; String email; String login; }

… and you’re updating to this:

class User { int id; String password; Timestamp passwordExpires; String email; String login; }

The best thing that can happen is an outage. For instance, the 3rd field used to be a String. You added a Timestamp field before it and in the new definition the 3rd field is a Timestamp. Hibernate on new nodes fails on load() with ClassCastException from String to Timestamp, because cache still has the old definition.

The much worse case is data corruption. Suppose you need a User for the following transaction:

User user = session.load(User.class, 4); user.setPasswordExpires(...); session.update(user);

Let’s say that email was null. load() does not yield a ClassCastException , because null is a perfectly valid Timestamp . But when Hibernate loads such entity from cache, the cached entry only has 4 fields. Login is not restored and remains null. When you update() , you’re doomed. In this made up example here this would hopefully fail on a DB not-null constraint. In real life, though, it can silently save corrupted data in database and guarantee hours of very interesting debugging and restoring from backups, if not physical damage caused by your application’s misbehavior.

There’s this old piece of music called “Careful with that Axe, Eugene”. If you don’t know it, don’t bother googling. Don’t even get me started about YouTube, it needs proper sound setup and dynamics. So, here’s how it goes. Apparently boring, monotonic bass softly playing “bing, bang, bing, bang” (or D, D, D, D as Wikipedia says). Nothing happens for a few minutes, except for just as soft ambientish keyboard tones. And so on for one minute, another, then another. Then, out of the blue, an air-shattering scream. For the first time in my life I heard it in Australians’ concert, and it literally made me jump with a shot of adrenaline and panic.

That’s the experience with clustered cache and Hibernate. It’s very robust and stable. Boring. Unnoticeable. Until one day it makes you scream hard and tear your hair out.

Handle with care. Be ware. Be prepared.

Little piece of disclaimer: I don’t know if this exact example here reproduces the problem. It’s merely a made up illustration. The fields might be ordered by name, and Hibernate may refuse to restore from an array that has fewer fields than the current definition. But I have witnessed both issues in real life, and they caused much pain and cost time and money.