Using JDK 7's Fork/Join Framework Fork/Join is an enhancement to the ExecutorService implementation that allows you to more easily break up processing to be executed concurrently, and recursively, with little effort on your part



Java 7, which is due to be released within a matter of weeks, has many new features. In fact, it contains more new, big features than the previous Java SE version mainly because it's been so long since Java SE 6 was released. Some of the planned features even had to be deferred to JDK 8. Here's a summary of what's new:

JSR-292: Support for dynamically typed languages. Languages like Ruby, or Groovy, will now execute on the JVM with performance at or close to that of native Java code

JSR-334: Also called Project Coin, this includes many enhancements to the JVM core to treat smaller languages as first-class citizens

Improved class loading

JSR-166: The new Fork/Join framework for enhanced concurrency support

Unicode 6.0 and other Internationalization improvements

JSR-203: NIO.2, which includes better file system integration, better asynchronous support, multicast, and so on

Windows Vista IPv6 support

SDP, SCTP, and TLS 1.2 support

JDBC 4.1

Swing enhancements, Nimbus look-and-feel, enhanced platform window support, and new sound synthesizer

Updated XML and Web Services stack

Improved system and JVM reporting framework included with MBean enhancements

What got deferred to JDK 8? Here's a summary list:

Modular support for the JVM (Project Jigsaw)

Enhanced Java annotations

Java Closures (Project Lambda)

JSR-296: Swing Framework to eliminate boiler plate code

For a complete list of enhancements and new features, with full details, click here. For now, let's look at the new Fork/Join framework, and how it helps with Java concurrency.

What Is Fork/Join?

Fork/Join is an enhancement to the ExecutorService implementation that allows you to more easily break up processing to be executed concurrently, and recursively, with little effort on your part. It's based on the work of Doug Lea, a thought leader on Java concurrency, at SUNY Oswego. Fork/Join deals with the threading hassles; you just indicate to the framework which portions of the work can be broken apart and handled recursively. It employs a divide and conquer algorithm that works like this in pseudocode (as taken from Doug Lea's paper on the subject):

Result doWork(Work work) { if (work is small) { process the work } else { split up work invoke framework to solve both parts } }

It's your job to determine the amount of work to process before splitting it up. If it's too granular, the overhead of the Fork/Join framework may hurt performance. But if it's just right, the advantage of parallelism will increase performance. For instance, the sample application we'll examine will look for XML files to process in a set of directories. If there are too many files, the code will use the Fork/Join framework to recursively break down the workload across multiple threads. Since XML file processing involves a combination of I/O and CPU work, this is a perfect use of Fork/Join.

The framework handles the threads based on available resources. It also employs a second algorithm called work stealing, where idle threads can steal work from busy threads to help spread the load around without spawning new threads. The same type of algorithm is often used in garbage collectors that use parallel worker threads to walk the heap.

Java 7 Fork/Join Processing Example

Let's explore a sample application that checks a set of work directories for new XML files. As the files are processed, they're moved out of the work directories and into a special "processed" directory. This sample is loosely based on a news processing system I worked on years ago, where news articles were written to the appropriate directories as they were published. Then, a worker process that periodically checked the directories would process the files, and make them available on a website.

The code below is the complete Fork/Join XML processing application (minus the actual XML processing details). The main class, XMLProcessingForkJoin , starts off the actual parsing of files within a directory periodically. It uses the ProcessXMLFiles class, which extends the Fork/Join framework's java.util.concurrent.RecursiveAction base class, to recursively split up and process all the files in the source directory.

public class XMLProcessingForkJoin { class ProcessXMLFiles extends RecursiveAction { static final int FILE_COUNT_THRESHOLD = 2; String sourceDirPath; String targetDirPath; File[] xmlFiles = null; public ProcessXMLFiles(String sourceDirPath, String targetDirPath, File[] xmlFiles) { this.sourceDirPath = sourceDirPath; this.targetDirPath = targetDirPath; this.xmlFiles = xmlFiles; } @Override protected void compute() { try { // Make sure the directory has been scanned if ( xmlFiles == null ) { File sourceDir = new File(sourceDirPath); if ( sourceDir.isDirectory() ) { xmlFiles = sourceDir.listFiles(); } } // Check the number of files if ( xmlFiles.length <= FILE_COUNT_THRESHOLD ) { parseXMLFiles(xmlFiles); } else { // Split the array of XML files into two equal parts int center = xmlFiles.length / 2; File[] part1 = (File[])splitArray(xmlFiles, 0, center); File[] part2 = (File[])splitArray(xmlFiles, center, xmlFiles.length); invokeAll(new ProcessXMLFiles(sourceDirPath, targetDirPath, part1 ), new ProcessXMLFiles(sourceDirPath, targetDirPath, part2 )); } } catch ( Exception e ) { e.printStackTrace(); } } protected Object[] splitArray(Object[] array, int start, int end) { int length = end - start; Object[] part = new Object[length]; for ( int i = start; i < end; i++ ) { part[i-start] = array[i]; } return part; } protected void parseXMLFiles(File[] filesToParse) { // Parse and copy the given set of XML files // ... } } public XMLProcessingForkJoin(String source, String target) { // Periodically invoke the following lines of code: ProcessXMLFiles process = new ProcessXMLFiles(source, target, null); ForkJoinPool pool = new ForkJoinPool(); pool.invoke(process); } // Start the XML file parsing process with the Java SE 7 Fork/Join framework public static void main(String[] args) { if ( args.length < 2 ) { System.out.println("args - please specify source and target dirs"); System.exit(-1); } String source = args[0]; String target = args[1]; XMLProcessingForkJoin forkJoinProcess = new XMLProcessingForkJoin(source, target); } }

It starts with the main class's constructor, XMLProcessingForkJoin , where a new ProcessXMLFiles object is created and handed off to the Fork/Join framework via a call to ForkJoinPool.invoke() . The framework then calls the object's compute() method. First, a check is made to populate the list of files within the directory. Next, if the number of files to process is at or below a threshold (two files in this case), the files are processed and we're done. Otherwise, the array of files is split into two parts, and two new Fork/Join tasks are created to process each sublist of files, and so on, recursively, until all the files are parsed and processed.

Since the code just parses XML files, I chose to extend RecursiveAction in this application. If your processing actually returns a result that needs to be combined with the results of other Fork/Join subtasks (i.e. sorting, compressing data, tallying numbers, and so on), then you can extend RecursiveTask . I'll take a closer look at this and other changes to the concurrent classes in Java SE 7 in a future blog.

Happy coding!

-EJB