If you ever have the luxury of designing a brand new Java application there are many, new, exciting and unfamiliar technologies to choose from. All the flavours of NoSQL stores; Data Grids; PAAS and IAAS; JEE7; REST; WebSockets; an alphabet soup of opportunity combined with many programming frameworks both on the server side and client side adds up to a tyranny of choice.

However, if like me, you have to architect large scale, server-side, Java applications that support many thousands of users then there are a number of requirements that remain constant. The application you design must be high-performance, highly available, scalable and reliable.

It doesn’t matter how fancy your new lovingly crafted Javascript Web2.0 user interface is, if it is slow or simply not available nobody is going to use it. In this article (originally published in JAX Magazine) I will try and demystify one of your choices, the Java Data Grid and show how this technology can meet those constant non-functional requirements while at the same time taking advantage of the latest trends in hardware.

Latency: The Performance Killer

When building large scale Java applications the most likely cause of performance problems in your application is latency. Latency is defined as the time delay between requesting an operation, like retrieving some data to process, and the operation occurring. Typical causes of latency in a distributed Java application are;

IO latency pulling data from disk

IO latency pulling data across the network

Resource contention for example a distributed lock

Garbage Collection pauses

For example typical ping times across a network range from; 57µs on a local machine; 300 µs on a local LAN segment through to 100ms from London to New York. When these ping times are combined with typical network data transfer rates; 25MB – 30MB/s for 1Gb Ethernet; 250MB/s – 350MB/s for 10Gb Ethernet a careful trade-off between operation frequency and data granularity must be made to achieve acceptable performance. Ie. if you have 100MB of data to process the decision between making 100 calls across the network each retrieving 1MB, or 1 call retrieving the full 100MB will depend on the network topology. Network latency is normally the cause of the developer cry, “It was fast on my machine!”

Latency due to disk IO is also a problem, a typical SSD when combined with a SATA 3.0 interface can only deliver data at a sustained data rate of 500-600MB/s so if you have Gigabytes of data to process disk latency will impact your application performance.

The hardware component with the lowest latency is memory, typical main memory bandwidth, ignoring Cache hits, is around 3-5 GB/s and scales with the number of CPUs. If you have 2 processors you will get 10GB/s and with 4 CPUs 20GB/s etc. John McCalpin at Virginia maintains a memory benchmark called STREAM (http://www.cs.virginia.edu/stream/) which measures the memory throughput of many computers with some achieving TB/s with large numbers of CPUs.

In conclusion:

Memory is FAST: And therefore, for high performance, you should process data in memory.

Network is SLOW: Therefore for high performance minimise network data transfer.

The question then becomes is it feasible to process many Gigabytes of data in memory? With the costs of memory dropping it is now possible to buy single servers with 1TB of memory for only a few £30K – £40K and the latest SPARC servers are shipping supporting 32TB of RAM so Big Memory is here.

The other fundamental shift in hardware at the moment is the processing power of single hardware threads is starting to reach a plateau with manufactures moving more into providing CPUs with many cores and many hardware threads. This trend forces us to design our Java applications in a fashion that can utilise the large number of hardware threads appearing in modern chips.

Parallel is the Future: For maximum performance and scalability you must support many hardware threads.

Figure 1: Java Data Grids

The key benefits of the partitioned key space in a Data Grid when compared to fully replicated clustered Cache are that the more JVMs you add the more data you can store and access times for individual keys are independent of the number of JVMs in the grid.

For example, if we have 20 JVM nodes in our Grid each with 4GB of free heap available for the storage of objects then we can store, when taking into account duplicates, 40GB of Java objects. If we add a further 20 JVM nodes then we can store 80GB. Access times are constant to read/write objects as the grid will go directly to the JVM which owns the primary key space for the object we require.

JSR 107 defines a standards based API to data grids which is very similar to the java.util.Map API as shown

Listing 1

Many Data Grids also make use of Java NIO to store Java objects “off heap” in Java NIO buffers. This has the advantage that we can increase the memory available for storage without increasing the latency from garbage collection pause times.

Parallel Processing on the Grid

The problem arises when we store many 10s of GB of Java objects across the Grid in many JVMs and then want to run some processing across the data set. For example, we may store objects representing hotels and their availability on dates. What happens when we want to run a query like “find all the hotels in Paris with availability on Valentines day 2015”? If we follow the simple Map API approach we would need to run code like that shown in Listing 2.

Listing 2

However the problem with this approach, when accessing a Data Grid, is that the objects are distributed according to their keys across a large number of JVMs and every “get” call needs to serialize the object over the network to the requesting JVM. Using the listing above this could pull 10s of GB of data over the network which as we saw earlier is slow.

Thankfully most Java Data Grid products allow you to turn the processing on its head and instead of pulling the data over to the code to process they send the code to each of the Grid JVMs hosting the data and execute it in parallel in the local JVMs. As typically the code is very small in size only a few KB of data needs to be sent across the network.

Processing is run in parallel across all the JVMs making use of all the CPU cores in parallel. Example code, which runs the Paris query across the Grid, for Oracle Coherence, a popular Data Grid product is shown in Listing 3 and 4. Listing 3 shows the code for a Coherence EntryProcessor which is the code that will be serialized across all the nodes in the data grid.

This EntryProcessor will check each hotel as before to see if there is availability for Valentine’s day but unlike in Listing 2 it will do so in each JVM on local in-memory data. JSR107 also has the concept of an EntryProcessor so the approach is common to all Data Grid products.

Listing 3

Listing 4 shows the Oracle Coherence code needed to send this processor across the Data Grid to execute in parallel in all the grid JVMs.

Listing 4

Processing data using EntryProcessors as shown in Listings 3 and 4 will result in much greater performance on a Data Grid than access via the simple Cache API. As only a small amount of data will be sent across the network and all CPU cores across all the JVMs will be used to process the search.

Fast Data: Parallel Processing on the Grid

As we’ve seen, using a Data Grid in your next application will enable you to store large volumes of Java objects in memory for high performance access in a highly available fashion. This will also give you large scale parallel processing capabilities that utilise all the CPU cores in the Grid to crunch through processing Java objects in parallel. Take a look at Data Grids next time you have a latency problem or you have the luxury of designing a brand new Java application.