The Firefox 3.5 fiasco Thursday, July 9, 2009

(updated: replaced 'trashing' with 'thrashing' as indeed, I meant 'disk thrashing').

As a Firefox user, I was delighted when Mozilla released Firefox v3.5. It was advertised as a new milestone in browsing, with more standards being supported, new engines for javascript and web content rendering and the intarweb would appear to me faster than ever before. As a person who has a bit of a blind spot for marketing and everything related I was a tad skeptical, but I thought "What the heck!".

So I went ahead and downloaded the installer at release day and after fighting with the usual plug-in upgrade mess, I was able to run the browser for the first time and lo' and behold, the web felt like I was back in 1994, when no-one but the Real Geeks had web sites and everything was lighting fast. Live was good.

The next day, with a fresh cup of coffee in my hand I started my beloved Firefox 3.5 browser on my freshly booted system. I was expecting to see the browser dialog within seconds to re-experience the web at light speed, but nothing happened. Well, something did happen, my PC's hard-disk was busy like I was running three virus-scan sessions at the same time. After 35 seconds or so, it finally managed to find all the bits and pieces it apparently needed and showed me the familiar face of the Firefox browser dialog and I was on my way to the outside world!

Suddenly, a small, screechy voice in the back of my head tried to make a point. That voice, which sometimes cries through a developer's head when s/he writes a piece of code which isn't in the format the voice owner likes, at which point it desperately tries to convince you to do something else instead by giving unwanted advice like "Wouldn't it be better if... " and other lovely comments no-one wants to hear, that voice made a remark in it's usual dull way about those 35-something seconds before the browser really started. As with similar occasions, I didn't pay much attention to it. Every Firefox instance I started was lighting fast, and showed up 2 seconds or even faster, it must have been something else which had caused the delay at startup. I know, it didn't sound very convincing the first time either.

That afternoon, I had no browser window open and started a new one. Again I was rewarded with a long pause, disk crunching and a blank screen, until 30-35 seconds had passed and Firefox 3.5 was awake and ready for duty. "Hmmm..." I thought. Voicy in the back of my head was awake again too, with random babbling about "Told you so", "I'm not gonna repeat myself" and similar wisdom, the usual. Could it be windows or some service caused all this disk thrashing and the delay? I more and more got the feeling Voicy was right (I hate that feeling) and there was something fishy about all this.

I didn't want to wait 35 seconds every time I started a browser, so I wondered what to do. Then I realized I was an end-user of this application, this browser. And what do end-users do? That's right, they go to the support offering of the vendor. Mozilla has a nice forum system so I searched it a bit to see if fellow Firefox users had similar delays. Well... you could say... yes indeed. And not only delays of 30-40 seconds but some had to wait minutes or even worse: after starting Firefox, it went into a coma and never truly woke up. Mozilla also found out that more and more people had the same problem and added a sticky thread to their forum. You can reach it here.

That forum thread revealed what the true cause was of this disk thrashing and delay at startup. I have to warn you though. If you're a developer, your software engineering fire will die a little when you read the true cause and from then on you will have to fight off thoughts of giving up development altogether and apply for a job in marketing or HR. So what was it, what's the cause of this slowness? It's NSS. What? The Network Security System. It turns out that NSS needs to do all kinds of encryption and other security related tasks (which seems kind of logical), and for that it needs random numbers. Sounds reasonable, right? Well, it kind of does.

True random numbers are hard to produce, because in a computer system, nothing is really random, it all is a result of some action which was a result of some action etc. etc. The clever boys and girls of the NSS team had to crack this problem: how to get 'true' random numbers which are as random as possible? Instead of using the randomization functionality of the underlying operating system (which has this feature build-in as every TCP stack for example needs it), they did what Mozilla in general always does: they re-invented the wheel. Nothing against re-inventing stuff, don't get me wrong, not every wheel is as equal as the other one, and you can never have enough good, re-invented, shiny wheels. Though, the downside of re-inventing wheels is that along the way you can't make mistakes, it has to be better than the previous invented wheels. No-one wants to use your square new wheel for example.

To solve the problem of the randomization, the NSS team came up with something clever, something so great, that no-one else had ever thought of that before: they decided to read the files in all possible temp folders on disk with multiple threads so these files can be used as seeds for the randomization. Brilliant. Temp folders! Why hasn't anyone else thought of using a disk-based resource for random number generation! I mean, these folders change every couple of milliseconds, have immediate access, no latency to read their contents and are never filled to the brim with useless cruft!

That is, if you're on the NSS team. In the outside world, things are a tad different. You see, Firefox v3.5 reads the Internet Explorer Cache and the central Windows temp folder in your user profile, through its NSS subsystem. Not only is it, in my humble opinion, not done to read another application's caches or temp folders, it's also amazingly ignorant towards the real bottlenecks of our modern computers: hard-drives. If you're using a virus-scanner which is set to paranoia mode, this whole temp folder traversal by NSS will be even slower because every file accessed will be scanned by the virus scanner. Over and over and over again. And what happens if the user doesn't do anything else but browse with Firefox, so these temp folders will not change (or are empty)? Isn't using file reading the worst way to obtain a seed for randomization?

I used sysinternals' Filemon tool to check which folders and files Firefox was reading and along the way I also saw they read all fonts up front. All of them. That too seems rather odd for a browser who claims to be the fastest browser. How many fonts do you need on a random (pun intended) webpage? Besides the default ones and a few common ones? 2, 3? Would it hurt anyone if these are read 'on the fly'? Not compared to the delay in startup time for a browser dialog when you have many fonts installed.

NSS is open source, but it's not something you can fix yourself, unless you compile the browser as well. The problem is that NSS is a security component and therefore needs to be signed by Mozilla to be used in Firefox. This means that recompiling the NSS dlls won't work, Firefox won't accept them (which is logical, it's a vital part of the security system, heck it is the security system!). Though, why should I even bother? It's 2009, for crying out loud. After 15 years of web-browser development, the human race should have produced a web-browser by now which is worth using, without silly startup delays which last minutes or even longer. After all, in this case, I'm an end-user.

Mozilla on their forum says 'a' developer is working on a fix and they 'hope' that this developer is able to fix it. That's not sounding very promising to me. This is a top priority issue, Mozilla, unless you want droves of people drop your browser for the competition. There's already a fix available, Firefox v3.0 didn't do this disk thrashing and is able to communicate security over the internet, at least that's what you always told your users. In other words: the NSS version in Firefox 3.0 was capable of creating random numbers and doing encryption without the necessity of reading a competing browser's disk cache nor the OS' temp folders. In case you wonder, Mozilla, no, I'm not going to advice friends and family members to use Firefox 3.5 anymore till this is fixed. Not that you nor your droves of developers lose any sleep over that, at least I hope not, but with me I'm pretty sure more people will do the same: move away from Firefox or revert back to an older version and wait with advice to friends and family about Firefox 3.5.

I'll revert back to Firefox 3.0 till this is fixed, or move to another browser (although I find Chrome a bit too much Google in one package). If you're planning to upgrade to Firefox 3.5, be aware of the issue I described above and do realize that it's not something you can learn to live with, as the delay will occur randomly (pun intended) during the day: sometimes starting a browser is fast, however an hour later it can take again 30-40 seconds or longer.

Yes Voicy, I'll listen to you more. At least more often.