Abstract

File transport allows files in the local file system to be read from and written to. A polling transport scans a directory or set of directories repetitively with a given period of interval. This is usually an overhead and leads to inefficient use of system resources since it scans the entire set or directories and files periodically even when there are no modifications. As a solution, NIO file transport acts as a non-polling transport which will trigger an event if and only if a file or a directory is created or modified within its monitoring scope.

How This Actually Works

The JDK 7 provided a special package called java.nio. Java NIO (New I/O) is an alternative I/O API for Java (from Java 1.4), serving as an alternative to the standard Java I/O and Java Networking APIs. Java NIO offers a different way of working with IO than the standard I/O APIs. This package provided a sub package java.nio.file containing a file system change notification API called the WatchService API. This API enables to register a directory (or set of directories) with a watch service which, when started, captures all the events of creation or modification of files and directories and makes them available via an event queue, similar to the Linux inotify API.

The following code sample represents how that can be implemented.

Path dir = Paths.get("/tmp"); WatchService watcher = FileSystems.getDefault().newWatchService(); WatchKey key = dir.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);





So, in this event-based file transport, we give a set of parameters as follows:

Root path: Root path for a file/directory to watch the files from.

Path pattern: A pattern for file path to watch for files.

Pattern syntax: Pattern syntax of the file path pattern (glob or regex).

Initially, when the watch service gets started, it registers all the directories and sub-directories under the root directory that we register the watch service for. Then it keeps an eye on each and every directory registered under the root directory. Whenever a file or directory is created, modified, or deleted, it triggers an event. But the event is triggered according to the registered events that we are interested in. So, if it is a directory, it is registered in the watch-service in order to detect the changes inside of it, and if it is a file, it is taken for the process.

The triggered events are added to a queue by the watch service and we have to process it by taking them as batches by key.pollEvents(). That returned event has 4 types of StandardWatchEventKinds as follows:

ENTRY_CREATE: A directory entry is created.

ENTRY_DELETE: A directory entry is deleted.

ENTRY_MODIFY: A directory entry is modified.

OVERFLOW: Indicates that events might have been lost or discarded.

The following code sample represents how that can be implemented.

WatchKey key = watcher.take(); for (WatchEvent<?> event: key.pollEvents()) { WatchEvent.Kind<?> kind = event.kind(); }





Path pattern and pattern syntax are used to filter the files we want. We can give a path pattern in the syntax of glob or regex and ask to scan several directories for the several types of files we want.

If these types of patterns are given after we get an event from the file, we check the file name + file path with the path pattern. So we use getPathMatcher method provided by the java.nio.FileSystem package. We create a custom class (GRPattern.java), which keeps the pattern details with pattern syntax and path pattern.

public boolean isMatchPattern(GRPattern GRPattern, Path file) { PathMatcher matcher = FileSystems.getDefault().getPathMatcher(GRPattern.getPatternSyntax() + Paths.get(GRPattern.getPathPattern()).toString()); Path name = file.getFileName(); return name != null && matcher.matches(file); }





OVERFLOW Events OVERFLOW is a special type of event. We do not have to register for the OVERFLOW event to receive it. When this type of event is received, it means that the queue is overflowed and there is no space to add currently created events. After we reset the queue, it starts the process again, but there may be one or more events lost due to this overflow. So, it should be handled manually as follows: We can take the system time and directory when an OVERFLOW event gets detected. Then we can scan the relevant directory for the files that have been created before that time. That is not an overhead because the normal procedure detects the files that have been created in that directory, so there are very few files left to process. But if we don’t remove the detected file from the system, this approach is not going to work. We can stop the scanning of the current directory involved in the overflow. Then we can register that directory as a new directory created and start the procedure from the beginning (only for that directory). Example of an OVERFLOW Event Suppose we have a case where a new directory gets registered with the watch service and, at that moment, a huge number of files (say 10,000) are copied to that directory or created inside of that directory. Then the watch service will detect all the files and queue for processing. But if the dequeuing and processing part of the files is slower than the enqueuing of files to the queue, the event OVERFLOW will be thrown.