How it work now (briefly) and the optimization key

Let’s say I have a project which depends on lodash …

When I do a npm install, npm will parse my package.json file and try to install all dependency, by their name.

That means, for each dependency, it’ll make an HTTP request like http://registry.npmjs.com/lodash :

Two things are important here :

The hostname we made the request on is registry.npmjs.com ; so yes, it’s a data registry, so is a DNS server :) ; The server header : “CouchDB/1.5.0” ; so ye’s we’re making a (cached by fastly) HTTP query to NPM database. Basically, registry.npmjs.com is a CouchDB database.

As you can see, the body of the answer is a JSON document which contains ALL the packages information (name, description, versions, repository URL, authors, etc.) ; so npm will check if the package is installed or if it needs to be updated according to the dependencies declaration.

For lodash, it’s (at the time of writing this post) 109kB ; and, let’s say we’re doing a npm outdated on express, it’s a total of 27 HTTP requests and 700kB of data… Damn, we just wanted to check our package versions !

Well, let’s be honest, this is NOT what happens each time npm checks a version of a package, to get these numbers, I cheated a bit (sorry). npm keeps a cache of the answers (in ~/.npm/registry.npmjs.org/<package>) with the full JSON document plus a property named “_etag” to respect the Non-Match/304 Not modified part of the HTTP protocol. So yes, it’s already optimized for that part but still, we have to query their server on each package and parse a full big JSON document every time.