Summary

The Mozilla Platform keeps improving: JavaScript native file management is an undergoing work to provide a high-performance JavaScript-friendly API to manipulate the file system.

The Mozilla Platform, JavaScript and Files

The Mozilla Platform is the application development framework behind Firefox, Thunderbird, Instantbird, Camino, Songbird and a number of other applications.

While the performance-critical components of the Mozilla Platform are developed in C/C++, an increasing number of components and add-ons are implemented in pure JavaScript. While JavaScript cannot hope to match the speed or robustness of C++ yet (edit: at least not on all aspects), the richness and dynamism of the language permit the creation of extremely flexible and developer-friendly APIs, as well as quick prototyping and concise implementation of complex algorithms without the fear of memory errors and with features such as higher-level programming, asynchronous programming and now clean and efficient multi-threading. If you combine this with the impressive speed-ups experienced by JavaScript in the recent years, it is easy to understand why the language has become a key element in the current effort to make the Mozilla Platform and its add-ons faster and more responsive at all levels.

Many improvements to the JavaScript platform are pushing the boundary of what can be done in JavaScript. Core Modules, strict mode and the let construct are powerful tools that empower developers to produce reusable, clean and safe JavaScript libraries. The Mozilla Platform offers XPConnect and now js-ctypes, two extremely powerful technologies that let privileged JavaScript maskerade as C/C++ and get access to the low-level features of the platform. Other technologies such as the Web Workers expose low-level operating system features through fast, JavaScript-friendly APIs (note that the Mozilla Platform has exposed threads and processes to JavaScript at least since 2005 – Web Workers are faster, nicer, and play much more nicely with the runtime, in particular with respect to garbage-collection and the memory model).

Today, I would like to introduce one such improvement: native file management for JavaScript, also known as OS.File .

Since JavaScript has become a key component to the Mozilla Platform, the Mozilla Platform needs a great library for manipulating files in JavaScript. While both XPConnect and JS-ctypes can (and have been) used for this purpose, our objective, with this library, is to go way beyond the file management APIs that has been exposed to JavaScript so far, regardless of the platform, in terms of:

expressiveness;

integration with the JavaScript side of the Mozilla Platform;

operating system-level features;

performance;

extensibility.

This library is a work in progress by the Mozilla Performance Team, and we have good hope that a fully working prototype will be available by early January. Not everything is implemented yet and all sorts of adjustments can yet be made based on your feedback.

Once we have delivered, it is our hope that you will use this library for your future works on the Mozilla Platform, whether you are extending the Mozilla Platform, developing an add-on or an application, or refactoring some existing feature.

Let me emphasize that this is a Mozilla Platform API (hence the “OS” prefix), not a Web API. By opposition to the HTML5 File object, this API gives full access to the system, without any security limitation, and is definitely not meant to be scriptable by web applications, under any circumstance.

Manipulating files, the JavaScript way

Reading from a file

Let us start with something simple: reading from a file.

First, open the library:

Components.utils.import("resource://gre/modules/osfile.jsm");

OS.File is a JavaScript module, in other words it is shared between all users in the same thread. This is particularly important for speed, as this gives us the ability to perform aggressive caching of certain data.

Once you have opened the module, you may read your file:

var fileName = "/home/yoric/hello"; var contents = OS.File.openForReading.using(fileName, function(myFile) { return myFile.readString() });

This extract:

opens file "/home/yoric/hello" for reading;

for reading; reads the contents of the file as a string (assuming ASCII encoding);

closes the file;

reports an error if anything wrong has happened either during opening or during reading;

places the result in variable contents .

This short listing already demonstrates a few interesting elements of the API. Firstly, notice the use of function using . This function performs scope-bound resource management to ensure that the file is properly closed once it has become unneeded, even in presence of errors. This has roughly the same role as a finally block in Java or a destructor on a C++ auto-pointer. I will return to the topic of resource management later. For the moment, suffices to say that closing a file through using or method close is optional but recommended, as open files are a limited resource on all operating systems.

Had we decided to entrust JavaScript to close the file by itself at some point in the future, we could have simply written:

var fileName = "/home/yoric/hello"; var contents = OS.File.openForReading(fileName).readString();

Secondly, consider OS.File.openForReading . As its name suggests, this function/object serves to open an existing file for reading, and it fails if the file does not exist yet. The API provides such functions for all common scenarios, all of which accept optional flags to customize Unix-style file rights, Windows-style sharing properties and other Unix- or Windows-style attributes. Alternatively, function/object/constructor OS.File is the general manner of controlling all details of file opening.

The extracts above do not demonstrate any feature that could not have been achieved with XPConnect. However, let us briefly compare our extracts with an XPConnect-based implementation using similar lines:

the OS.File implementation consists of 2 to 4 lines, including resource cleanup and error-handling / a comparable XPConnect-based implementation requires about 30 lines;

implementation consists of 2 to 4 lines, including resource cleanup and error-handling / a comparable XPConnect-based implementation requires about 30 lines; the OS.File implementation works both in the main thread or in a background thread / a comparable XPConnect-based implementation works only in the main thread;

implementation works both in the main thread or in a background thread / a comparable XPConnect-based implementation works only in the main thread; benchmarks are not available yet, but I have hope that the OS.File implementation should be slightly faster due to a lower overhead and an optimized implementation of readString;

implementation should be slightly faster due to a lower overhead and an optimized implementation of readString; in case of error, the OS.File implementation raises an exception with constructor OS.File.Error / the XPConnect-based implementation raises a generic XPConnect exception;

implementation raises an exception with constructor / the XPConnect-based implementation raises a generic XPConnect exception; if the file does not exist, the OS.File implementation raises an error while executing OS.File.openForReading / the XPConnect-based implementation raises an error later in the process;

implementation raises an error while executing / the XPConnect-based implementation raises an error later in the process; if executed on the main thread, the OS.File implementation will print a warning.

Note that OS.File manipulates this and closures in the JavaScript fashion, which makes it possible to make our extracts even more concise, as follows:

var fileName = "/home/yoric/hello"; var contents = OS.File.openForReading.using(fileName, function() { return this.readString(); });

or, equivalently,

var fileName = "/home/yoric/hello"; var contents = OS.File.openForReading.using(fileName, OS.File.prototype.readString);

Of course, OS.File is not limited to strings. Indeed, to return a typed array, simply replace readString with readBuffer . For better performance, it is also possible to reuse an existing buffer. This is done by replacing readBuffer with readTo .

Also, OS.File is not limited to reading entire files. Indeed, all read/write functions accept an optional argument that may be used to determine a subset of the file that must be read:

var fileName = "/home/yoric/hello"; var contents = OS.File.openForReading.using(fileName, {fileOffset: 10, bytes: 100}, OS.File.prototype.readString);

Well-known directories

The operations we have demonstrated so far use an hard-coded path “/home/yoric/hello”. This is not a very good idea, as this path is valid only under Linux, but not under Windows or MacOS. Therefore, we certainly prefer asking the Mozilla Platform to select the path for us. For this purpose, we may replace the first line with:

var fileName = OS.Path.home.get("hello");

This extract:

uses global object OS.Path (part of library OS.File );

(part of library ); requests the path to the user’s home directory;

requests item "hello" at this path.

The extract demonstrates a few things. Firstly, the use of OS.Path . This object contains paths to well-known directories, and can be extended with new directories. Each path has constructor OS.Path , and supports a method get that serves to enter into files/directories. Secondly, the use of OS.Path as a path for functions of module OS.File : any function of this module accepts an OS.Path in place of a hard-coded directory.

Note that OS.Path objects are purely in-memory constructs. Building an OS.Path does not cause any call to the file system.

As previously, something similar is feasible with XPConnect. Comparing with a XPConnect-based implementation, we may notice that:

the OS.File implementation consists of 1 line / a comparable XPConnect-based implementation consists of 1 to 4 lines, depending on the use of additional libraries;

implementation consists of 1 line / a comparable XPConnect-based implementation consists of 1 to 4 lines, depending on the use of additional libraries; the OS.File implementation works both in the main thread and in a background thread / again, XPConnect works only in the main thread;

implementation works both in the main thread and in a background thread / again, XPConnect works only in the main thread; benchmarks are not available yet, but I have hope that the OS.File implementation should be slightly faster due to a lower overhead and use of caching.

Behaving nicely

The operations we have demonstrated so far are synchronous. This is probably not problematic for file opening, but reading a large file synchronously from the main thread is a very bad idea, as it will freeze the user interface until completed. It is therefore a good idea to either send the operation to a background thread or to ensure that reading takes place by small chunks.

OS.File supports both scenarios by integrating with (work-in-progress) libraries Promise and Schedule, both of which will be introduced in another post, once their API has stabilized.

The first step to reading asynchronously is to open library Promise. We will take the opportunity to open Schedule

Components.utils.import("resource://gre/modules/promise.jsm"); Components.utils.import("resource://gre/modules/schedule.jsm");

Now that the module is open, we may use asynchronous reading and asynchronous writing functions:

var promisedContents = OS.File.openForReading(fileName). readString.async();

This operation schedules progressive reading of the file and immediately returns. Note that we do not close the file, as this would stop reading, probably before the operation is complete. The result of the operation, promisedContents , is a Promise, i.e. a variable that will eventually contain a value, and that may be observed or polled, as follows:

promisedContents.onsuccess(function(contents) { console.log("It worked", contents); }); promisedContents.onerror(function(error) { console.log("It failed", error); });

Similarly, reading from a background thread is a simple operation:

var promisedContents = Schedule.bg(function() { importScripts("resource://gre/modules/osfile.jsm"); var fileName = "/home/yoric/hello"; return OS.File.openForReading.using(fileName, function(myFile) { return myFile.readAsString(); }); );

The call to Schedule.bg “simply” sends a task to a background thread and ensures that any result, error, etc. is routed back to the promise. The promised value itself is used exactly as in the previous example.

Once again, we may compare to the XPConnect-based implementation;

OS.File -based implementation of asynchronous reading takes 3 lines including opening, closing, resource management / general XPConnect-based implementation of asynchronous reading takes about 10-15 lines, although reading from a hard-coded path or a resource inside the Mozilla Platform can be reduced to 5-6 lines;

-based implementation of asynchronous reading takes 3 lines including opening, closing, resource management / general XPConnect-based implementation of asynchronous reading takes about 10-15 lines, although reading from a hard-coded path or a resource inside the Mozilla Platform can be reduced to 5-6 lines; OS.File implementation of background reading takes 5 lines / XPConnect does not expose sufficient features to permit permit background, although such features could certainly be implemented in C++ and exposed through XPConnect;

implementation of background reading takes 5 lines / XPConnect does not expose sufficient features to permit permit background, although such features could certainly be implemented in C++ and exposed through XPConnect; OS.File -based implementation only works for files / XPConnect-based implementation works for just about any construction;

-based implementation only works for files / XPConnect-based implementation works for just about any construction; benchmarks are not available, but I have hope that the OS.File implementation should be faster than the XPConnect-based implementation due to a less generic implementation and a lower overhead;

implementation should be faster than the XPConnect-based implementation due to a less generic implementation and a lower overhead; the promises used in the OS.File -based implementation encourages writing code in natural order, in which the code that uses a value appears after the code that fetches the value / XPConnect-based implementation encourages backwards coding, in which the function that uses a value appears before the code that fetches the value (aka “asynchronous spaghetti programming”).

API summary

The API defines the following constructors:

OS.File – all operations upon an open file, including reading, writing, accessing or altering information, flushing, closing the file;

– all operations upon an open file, including reading, writing, accessing or altering information, flushing, closing the file; OS.Dir – all operations upon an open directory, including listing its contents, walking through the directory, opening an item of the directory, removing an item of the directory;

– all operations upon an open directory, including listing its contents, walking through the directory, opening an item of the directory, removing an item of the directory; OS.Path – all operations on paths which do not involve opening a directory, including concatenation, climbing up and down the tree ;

– all operations on paths which do not involve opening a directory, including concatenation, climbing up and down the tree ; OS.File.Error – all file-system related errors.

and the following global objects:

OS.File – opening a file, with or without auto-cleanup;

– opening a file, with or without auto-cleanup; OS.Dir – opening a directory;

– opening a directory; OS.Path – well-known directories and files.

Speed

Writing fast, cross-platform, file manipulation code is a complex task. Indeed, some platforms accelerate opening a file from a directory (e.g. Linux), while other platforms do not have such operations (e.g. MacOS, Windows). Some platforms let applications collect all information regarding a file with a single system call (Unix), while others spread the work through several system calls (Windows). The amount of information that may be obtained upon a file without having to perform additional system calls varies from OS to OS, as well as the maximal length of a path (e.g. under Windows, the value of MAX_PATH is false), etc.

The design of OS.File takes this into account, as well as the experience from the previous generations of file manipulation APIs in the Mozilla Platform ( prfile and nsIFile / nsILocalFile ), and works hard to minimize the number of system calls required for each operation, and to let experts fine-tune their code for performance. While benchmarking is not available yet, we have good hope that this will make it possible to write IO code that runs much faster, in particular on platforms with slow file systems (e.g. Android).

In addition, although this should have a much smaller impact, OS.File uses as bridge between C++ and JavaScript the JSAPI, which is, at the moment of this writing, the fastest C++-to-JavaScript bridge on the Mozilla Platform.

Responsiveness

Speed is not sufficient to ensure responsiveness. For this purpose, long-running operations are provided with asynchronous variants that divide the work in smaller chunks to avoid freezing up the thread. The API does not enforce the use of these asynchronous variants, as experience shows that such a drastic choice is sometimes too constraining for progressive refactoring of synchronous code towards better asynchronicity.

Every operation can be backgrounded thanks to the Schedule module. At the time of this writing, it is not possible to send a file from a thread to another one, but we have a pretty clear idea of how we can do this, so this should become possible at some point in the future.

What now?

As mentioned, this is a work in progress. I am currently hard at work on building a complete prototype by the end of December, with the hope of landing something soon afterwards. I expect that benchmarking will continue after this stage to fine-tune some low-level choices and improve the API. If you wish to follow progress – or vote for this feature – we have a Bugzilla tracking bug on the topic, and a whole host of subbugs.

Note that this API will not replace nsIFile , although once it has landed, some of our JavaScript code will progressively from nsIFile to OS.File .

If you have any feedback, now is a great time to send it. Would you use this API? Would you need certain specific or obscure feature that is currently missing in the Mozilla Platform or that risks being lost?

In future posts, I will introduce further examples and detail some of the choices that we have made to ensure the best possible speed on all platforms.

Stay tuned!