Firefox’s preferences system uses data files to store information about default preferences within Firefox, and user preferences in a user’s profile (such as prefs.js, which records changes to preference values, and user.js, which allows users to override default preference values).

A new parser

These data files use a custom format, and therefore Firefox has a custom parser for them. I recently rewrote the parser. The new parser has the following benefits over the old parser.

It is faster (raw parsing speed is close to 2x faster).

It is safer (because it’s written in Rust rather than C++).

It is more correct and better tested (the old one got various obscure edge cases wrong).

It is more readable, and easier to modify.

It issues no warnings, only errors.

It is slightly stricter (e.g. doesn’t allow any malformed input, and it catches integer overflow).

It has error recovery and better error messages (including correct line numbers).

Modifiability

Modifiability was the prime motivation for the change. I wanted to make some adjustments to the preferences file grammar, but this would have been very difficult in the old parser, because it was written in an awkward style.

It was essentially a single loop containing a giant switch statement on a state variable. This switch was executed for every single char in a file. The states held by the state variable had names like PREF_PARSE_QUOTED_STRING, PREF_PARSE_UNTIL_OPEN_PAREN, PREF_PARSE_COMMENT_BLOCK_MAYBE_END. It also had a second state variable, because in some places a single one wasn’t enough; the parser had to return to the previous state after exiting the current state. Furthermore, lexing and parsing were not separate, so code to handle comments and whitespace was spread around in various places.

The new parser is a recursive descent parser — even though the grammar doesn’t actually have any recursion — in which the structure of the code reflects the structure of the grammar. Lexing is distinct from parsing. As a result, the new parser is much easier to read and modify. In particular, after landing it I added error recovery without too much effort; that would have been almost impossible in the old parser.

Note that the idea of error recovery for preferences parsing was first proposed in bug 107264, filed in 2001! After landing it, I tweeted the following.

I fixed an old bug: https://t.co/llDURdHUN8 Imagine going back in time and telling the reporter “this bug will get fixed 16 years from now, and the code will be written in a systems programming language that doesn’t exist yet”. — Nicholas Nethercote (@nnethercote) February 20, 2018

Amazingly enough, the original reporter is on Twitter and responded!

I kept getting emails on this bug over the years — dependencies and stuff — and I’d be like, “this bug is still open?!” Great job, @nnethercote! https://t.co/uVLYK8Tn6U — Kevin Basil Fritts (@kevinbasil) March 1, 2018

Strictness

The new parser is slightly stricter and rejects some malformed input that the old parser accepted.

Junk chars

Disconcertingly, the old parser allowed arbitrary junk between preferences (including at the start and end of the prefs file) so long as that junk didn’t include any of the following chars: ‘/’, ‘#’, ‘u’, ‘s’, ‘p’. This means that lines like these:

!foo@bar&pref("prefname", true); ticky_pref("prefname", true); // missing 's' at start User_pref("prefname", true); // should be 'u' at start

would all be treated the same as this:

pref("prefname", true);

The new parser disallows such junk because it isn’t necessary and seems like an unintentional botch by the old parser. In practice, this caught a couple of prefs that accidentally had an extra ‘;’ at the end.

SUB char

The old parser allowed the SUB (0x1a) character between tokens and treated it like ‘

’.

The new parser does not allow this character. SUB was used to indicate end-of-file (not end-of-line) in some old operating systems such as MS-DOS, but this doesn’t seem necessary today.

Invalid escapes

The old parser tolerated (with a warning) invalid escape sequences within string literals — such as “\q” (not a valid escape) and “\x1” and “\u12″(both of which have insufficient hex digits) — accepting them literally.

The new parser does not tolerate invalid escape sequences because it doesn’t seem necessary and would complicate things.

NUL char

The old parser tolerated the NUL character (0x00) within string literals; this is

dangerous because C++ code that manipulates string values with embedded NULs will almost certainly consider those chars as end-of-string markers.

The new parser treats the NUL character as end-of-file, to avoid this danger. (The escape sequences “\x00” and “\u0000” are also disallowed.)

Integer overflow

The old parser allowed integer literals to overflow, silently wrapping them.

The new parser treats integer overflow as a parse error. This seems better,

and it caught overflows of several existing prefs.

Consequences

Error recovery minimizes the risk of data loss caused by the increased strictness because malformed pref lines in prefs.js will be removed but well-formed pref lines afterwards are preserved.

Nonetheless, please keep an eye out for any other problems that might arise from this change.

Attributes

I mentioned before that I wanted to make some adjustments to the preferences file grammar. Specifically, I changed the grammar used by default preference files (but not user preference files) to support annotating each preference with one or more boolean attributes. The attributes supported so far are ‘sticky’ and ‘locked’. For example:

pref("sticky.pref", true, sticky); pref("locked.pref", 123, locked); pref("sticky-and-locked-pref", "blah", sticky, locked);

Note that the addition of the ‘locked’ attribute fixed a 10 year old bug.

When will this ship?

All of these changes are on track to ship in Firefox 60, which is due to release on May 9th.