Follow @vlad_mihalcea Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

Previously, I described the second-level cache entry structure, Hibernate uses for storing entities. Besides entities, Hibernate can also store entity associations and this article will unravel the inner workings of collection caching.

Domain model

For the up-coming tests we are going to use the following entity model:

A Repository has a collection of Commit entities:

@org.hibernate.annotations.Cache( usage = CacheConcurrencyStrategy.READ_WRITE ) @OneToMany(mappedBy = "repository", cascade = CascadeType.ALL, orphanRemoval = true) private List<Commit> commits = new ArrayList<>();

Each Commit entity has a collection of Change embeddable elements.

@ElementCollection @CollectionTable( name="commit_change", joinColumns = @JoinColumn(name="commit_id") ) @org.hibernate.annotations.Cache( usage = CacheConcurrencyStrategy.READ_WRITE ) @OrderColumn(name = "index_id") private List<Change> changes = new ArrayList<>();

And we’ll now insert some test data:

doInTransaction(session -> { Repository repository = new Repository("Hibernate-Master-Class"); session.persist(repository); Commit commit1 = new Commit(); commit1.getChanges().add( new Change("README.txt", "0a1,5...") ); commit1.getChanges().add( new Change("web.xml", "17c17...") ); Commit commit2 = new Commit(); commit2.getChanges().add( new Change("README.txt", "0b2,5...") ); repository.addCommit(commit1); repository.addCommit(commit2); session.persist(commit1); });

Read-through caching

The Collection cache employs a read-through synchronization strategy:

doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for (Commit commit : repository.getCommits()) { assertFalse(commit.getChanges().isEmpty()); } });

and collections are cached upon being accessed for the first time:

select collection0_.id as id1_0_0_, collection0_.name as name2_0_0_ from Repository collection0_ where collection0_.id=1 select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.r select changes0_.commit_id as commit_i1_1_0_, changes0_.diff as diff2_2_0_, changes0_.path as path3_2_0_, changes0_.index_id as index_id4_0_ from commit_change changes0_ where changes0_.commit_id=1 select changes0_.commit_id as commit_i1_1_0_, changes0_.diff as diff2_2_0_, changes0_.path as path3_2_0_, changes0_.index_id as index_id4_0_ from commit_change changes0_ where changes0_.commit_id=2

After the Repository and its associated Commits get cached, loading the Repository and traversing the Commit and Change collections will not hit the database, since all entities and their associations are served from the second-level cache:

LOGGER.info("Load collections from cache"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(2, repository.getCommits().size()); });

There’s no SQL SELECT statement executed when running the previous test case:

CollectionCacheTest - Load collections from cache JdbcTransaction - committed JDBC Connection

Collection cache entry structure

For entity collections, Hibernate only stores the entity identifiers, therefore requiring that entities be cached as well:

key = {org.hibernate.cache.spi.CacheKey@3981} key = {java.lang.Long@3597} "1" type = {org.hibernate.type.LongType@3598} entityOrRoleName = {java.lang.String@3599} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Repository.commits" tenantId = null hashCode = 31 value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3982} value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3986} "CollectionCacheEntry[1,2]" version = null timestamp = 5858841154416640

The CollectionCacheEntry stores the Commit identifiers associated with a given Repository entity.

Because element types don’t have identifiers, Hibernate stores their dehydrated state instead. The Change embeddable is cached as follows:

key = {org.hibernate.cache.spi.CacheKey@3970} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes#1" key = {java.lang.Long@3974} "1" type = {org.hibernate.type.LongType@3975} entityOrRoleName = {java.lang.String@3976} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes" tenantId = null hashCode = 31 value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3971} value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3978} state = {java.io.Serializable[2]@3980} 0 = {java.lang.Object[2]@3981} 0 = {java.lang.String@3985} "0a1,5..." 1 = {java.lang.String@3986} "README.txt" 1 = {java.lang.Object[2]@3982} 0 = {java.lang.String@3983} "17c17..." 1 = {java.lang.String@3984} "web.xml" version = null timestamp = 5858843026345984

Collection Cache consistency model

Consistency is the biggest concern when employing caching, so we need to understand how the Hibernate Collection Cache handles entity state changes.

The CollectionUpdateAction is responsible for all Collection modifications and whenever the collection changes, the associated cache entry is evicted:

protected final void evict() throws CacheException { if ( persister.hasCache() ) { final CacheKey ck = session.generateCacheKey( key, persister.getKeyType(), persister.getRole() ); persister.getCacheAccessStrategy().remove( ck ); } }

This behavior is also documented by the CollectionRegionAccessStrategy specification:

For cached collection data, all modification actions actually just invalidate the entry(s).

Based on the current concurrency strategy, the Collection Cache entry is evicted:

before the current transaction is committed, for CacheConcurrencyStrategy.NONSTRICT_READ_WRITE

the current transaction is committed, for CacheConcurrencyStrategy.NONSTRICT_READ_WRITE right after the current transaction is committed, for CacheConcurrencyStrategy.READ_WRITE

the current transaction is committed, for CacheConcurrencyStrategy.READ_WRITE exactly when the current transaction is committed, for CacheConcurrencyStrategy.TRANSACTIONAL

Adding new Collection entries

The following test case adds a new Commit entity to our Repository:

LOGGER.info("Adding invalidates Collection Cache"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(2, repository.getCommits().size()); Commit commit = new Commit(); commit.getChanges().add( new Change("Main.java", "0b3,17...") ); repository.addCommit(commit); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(3, repository.getCommits().size()); });

Running this test generates the following output:

--Adding invalidates Collection Cache insert into commit (id, repository_id, review) values (default, 1, false) insert into commit_change (commit_id, index_id, diff, path) values (3, 0, '0b3,17...', 'Main.java') --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id11_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection

After a new Commit entity is persisted, the Repository.commits collection cache is cleared and the associated Commits entities are fetched from the database (the next time the collection is accessed).

Removing existing Collection entries

Removing a Collection element follows the same pattern:

LOGGER.info("Removing invalidates Collection Cache"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(2, repository.getCommits().size()); Commit removable = repository.getCommits().get(0); repository.removeCommit(removable); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(1, repository.getCommits().size()); });

The following output gets generated:

--Removing invalidates Collection Cache delete from commit_change where commit_id=1 delete from commit where id=1 --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection

The Collection Cache is evicted once its structure gets changed.

Removing Collection elements directly

Hibernate can ensure cache consistency, as long as it’s aware of all changes the target cached collection undergoes. Hibernate uses its own Collection types (e.g. PersistentBag, PersistentSet) to allow lazy-loading or detect dirty state.

If an internal Collection element is deleted without updating the Collection state, Hibernate won’t be able to invalidate the currently cached Collection entry:

LOGGER.info("Removing Child causes inconsistencies"); doInTransaction(session -> { Commit commit = (Commit) session.get(Commit.class, 1L); session.delete(commit); }); try { doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(1, repository.getCommits().size()); }); } catch (ObjectNotFoundException e) { LOGGER.warn("Object not found", e); }

--Removing Child causes inconsistencies delete from commit_change where commit_id=1 delete from commit where id=1 -committed JDBC Connection select collection0_.id as id1_1_0_, collection0_.repository_id as reposito3_1_0_, collection0_.review as review2_1_0_ from commit collection0_ where collection0_.id=1 --No row with the given identifier exists: -- [CollectionCacheTest$Commit#1] --rolled JDBC Connection

When the Commit entity was deleted, Hibernate didn’t know it had to update all the associated Collection Caches. The next time we load the Commit collection, Hibernate will realize some entities don’t exist anymore and it will throw an exception.

Updating Collection elements using HQL

Hibernate can maintain cache consistency when executing bulk updates through HQL:

LOGGER.info("Updating Child entities using HQL"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for (Commit commit : repository.getCommits()) { assertFalse(commit.review); } }); doInTransaction(session -> { session.createQuery( "update Commit c " + "set c.review = true ") .executeUpdate(); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for(Commit commit : repository.getCommits()) { assertTrue(commit.review); } });

Running this test case generates the following SQL:

--Updating Child entities using HQL --committed JDBC Connection update commit set review=true --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection

The first transaction doesn’t require hitting the database, only relying on the second-level cache. The HQL UPDATE clears the Collection Cache, so Hibernate will have to reload it from the database when the collection is accessed afterward.

Updating Collection elements using SQL

Hibernate can also invalidate cache entries for bulk SQL UPDATE statements:

LOGGER.info("Updating Child entities using SQL"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for (Commit commit : repository.getCommits()) { assertFalse(commit.review); } }); doInTransaction(session -> { session.createSQLQuery( "update Commit c " + "set c.review = true ") .addSynchronizedEntityClass(Commit.class) .executeUpdate(); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for(Commit commit : repository.getCommits()) { assertTrue(commit.review); } });

Generating the following output:

--Updating Child entities using SQL --committed JDBC Connection update commit set review=true --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection

The BulkOperationCleanupAction is responsible for cleaning up the second-level cache on bulk DML statements. While Hibernate can detect the affected cache regions when executing a HQL statement, for native queries you need to instruct Hibernate what regions the statement should invalidate. If you don’t specify any such region, Hibernate will clear all second-level cache regions.

Conclusion

The Collection Cache is a very useful feature, complementing the second-level entity cache. This way we can store an entire entity graph, reducing the database querying workload in read-mostly applications. Like with AUTO flushing, Hibernate cannot introspect the affected tablespaces when executing native queries. To avoid consistency issues (when using AUTO flushing) or cache misses (second-level cache), whenever we need to run a native query we have to explicitly declare the targeted tables, so Hibernate can take the appropriate actions (e.g. flushing or invalidating cache regions).

Code available on GitHub.

Insert details about how the information is going to be processed DOWNLOAD NOW