Thoughts on NPM, CPAN, and modularisation

"I do Perl". I grew up with Perl (or, rather, I used it during most of my professional life, I actually grew up with machine language entered via a number pad), and one of its (many) mantras is that Perl's strength is that there is a module for everything on CPAN. While the current perl 5 maintainers clearly undervalue CPAN in favour of their experiments, today I will not write about them (but I had to mention it).

Today, I want to write down some thoughts about other repositories similar to CPAN, specifically the one that caters for node.js, with the npm package manager.

For a long time CPAN was the biggest repository of its kind (in fact, the only one), and many people still believe something like that, but in reality, CPAN is merely an also-ran with regards to number of modules, being overtaken by Java, Go, node.js, .NET, PHP, Python, Ruby, or really just about every other popular language.

Number of modules is a nice metric, but not everything. We all know (ehem) that code quality on CPAN is very uneven. A more modern problem is that CPAN is aging - it has a module for everything 10 years ago, but not a module for everything modern. Still, without doubt, CPAN is extremely useful.

A year or so ago, I felt slightly bothered by modulecounts.com - "wow!", I thought, "I had no idea, so how do these repositories look like?", and had a look at rubygems and npm. This gave me the general impression that, while there are a lot more modules, they are much smaller, less versatile, and the organisation leaves a lot to be desired, even compared to CPAN.

The latter might have been a very subjective bias towards Perl, and of course looking at module listings for a while isn't very reliable, so while I was content at having satisfied my curiosity and getting a rough impression, I left it at that.

BTW., betraying my Perl heritage I will call everything a module that acts like a CPAN distribution (which are not even modules), so when you read "module", you might want to replace it with "package" or whatever term is being correct.

The NPM case

Recently, there was some issue with the NPM repository, in that a module that merely exported a function to left-pad strings was apparently depublished (well, removed), causing a lot of other modules to fail since they had a direct or indirect dependency.

I only learned about that by semi-randomly stumbling over this article, "NPM & left-pad: Have We Forgotten How To Program?", in which the author complains about having modules for such trivial functionalities such as testing whether something is an array, with a single code line:

return toString.call(arr) == '[object Array]';

... padded with a lot of extra files and some boilerplate code. Or a four-line is-positive-integer module that until recently had three dependencies on other modules. And issues such as not being the same as is-positive-zero.

The author also mentioned that a blank new node.js project now amounts to 28000 installed files, which is indeed a staggering number (and not efficient at all, as small files are not usually stored efficiently, and cannot usually be loaded efficiently, either).

To make it short, the author argues that real programmers shouldn't have dependencies for such trivial snippets.

Not unexpectedly, the top comment linked to an opposing opinion piece, which claims that triviality is irrelevant, because of modularisation is good: Fixes will quickly become available to all users, it frees your mind of trivialities, enabling you to concentrate on higher level aspects of your code (something missing in some modern education schemes).

I think both positions are wrong because they are extremes - good programmers (as opposed to real ones :) choose trade-offs, and which trade-off is best depends on too many factors and might even change over time.

Chalk it up to an oversight

Shockingly, this article isn't really about that. The reason I even bothered to write about this mess is that, following links from the latter authors opinion piece, I immediately spotted bugs in the mentioned modules, which illustrate my point quite nicely.

First, the author sindresorhus mentions Chalk, "Terminal string styling done right", which can be used to colourise strings you send to terminals. Since I am interested in terminals in general, I of course went there to see how this functionality is packaged and presented to programs, to maybe learn something.

What instantly sprang into my eye while quickly scanning the documentation was an unfixable design bug, in that Chalk recommends this:

console.log(chalk.green('Hello %s'), name);

Now, nothing keeps terminals or e.g. terminfo descriptions to use % for command sequences, and chalk just blindly interpolates these into your string. Do you see the problem? Right, console.log can be mislead into interpreting these as commands to interpolate, and this just smells like a security bug in the making.

This problem might be unfixable - the documentation of the highly popular Chalk module is surprisingly bad: You don't learn what its methods return or how they are to be used, except that console.log is not the only way to use it.

Which means escaping % just for console.log 's sake breaks other uses. Upgrading the module will not fix this bug.

My home is my pwent

Another example sindresorhus mentioned that immediately piqued my interested was his user-home module, which ostensibly finds the, well the users home directory.

Except it doesn't, as its implementation makes clear:

var home = env.HOME; var user = env.LOGNAME || env.USER || env.LNAME || env.USERNAME; if (process.platform === 'win32') { return env.USERPROFILE || env.HOMEDRIVE + env.HOMEPATH || home || null; } if (process.platform === 'darwin') { return home || (user ? '/Users/' + user : null); } if (process.platform === 'linux') { return home || (user ? (process.getuid() === 0 ? '/root' : '/home/' + user) : null); }

This is, in fact, the old implementation (the module nowadays falls back on hopefully better functionality provided by node.js, although I would not hold my breath there).

There is so much wrong with the above code snippet, that I would be tempted to say that it is firmly in the not even wrong category: First, trusting environment variables might or might not be the right thing to do, depending on many circumstances, often security-related. The fact that the module documentation doesn't mention this in any way makes using this module a hazard, the module itself a hack.

Worse, the user home directory isn't hardcoded to /home/name on GNU/Linux, nor to /Users/name on OS X. There are perfectly fine functions in the standard C library that give you a users home directory ( getpwuid and friends), and these do not only work on "darwin" and "linux", but on all POSIX systems. Without doubt such functionality also exists on Windows.

This would avoid programs mysteriously breaking just because a university student runs them (at a university where home directories are networked and have interesting paths) or in the context of some system daemons or similar applications, which often have their home directories somewhere in var/lib or elsewhere.

The good news is that some of these problems could be fixed by a fixed upload. Or maybe not, because different return values might again break programs.

What about my own modules?

Now, if people would analyze my own Perl modules and their dependencies, they would find that I do not like "external" dependencies, that is, dependencies to modules I didn't write. And while this was not a conscious decision in most cases, the guiding principles were to not have trivial dependences (i.e. File::Slurp) and the fact that if you want something done, do it yourself).

The latter comes from experience with modules such as user-home - well intentioned, but too broken to even touch with a ten-feet pole.

And clearly, this has in some cases lead to suboptimal implementations, because sometimes even I can't be bothered to get something right in every circumstance.

You are the victim of an experiment

If you wonder what the moral of this story is, there probably isn't. The above is really an extended thought sequence that I would normally discuss with one of my friends, or discuss in passing in IRC. Making it into a blog article is really an experiment with this blog (which was not intended for such experiments, but why would I care about what I said or thought last year).

My goal here was to entertain you - hopefully, you could gain something from this discussion, even if only some entertainment.

Also, I am capable of learning, so if you liked this article and want to see more, feel free to tell me. If you didn't like this article but and want to see more of the old stuff, feel free to tell me. If you don't like my blog either way, piss off, but do it elsewhere please.

(Ha! Tricked you again! The moral is that CPAN is still cooler than those postmodernist would-be imitators with their new and untested baby modules and that the keep-it-shared argument is flawed).

Updates

When I first saw the left-pad implementation, I thought, wow, this looks inefficient. But since I know that (to me), efficient javascript is sometimes very surprising, I didn't pursue it. Apparently, left-pad is really inefficient, though - the more straightforward pad-left (sic!) module is about 10 times faster (and also has a dependency). The pad-left documentation also helpfully lists a bunch of other modules that do something similar.

All this kind of weakens the argument for modularity being great - sure, left-pad could be improved, but unfortunately it isn't. Also, "if you want it done...".

But all that aside, I didn't originally care about why the left-pad module was unpublished. Turns out the reasons are way more interesting than anything else in this discussion - the author of that module was threatened by lawyers and when he refused to buckle down, the lawyers threatened the npm maintainer, who then removed one of his modules. The author of left-pad then proceeded to liberate all of his modules.

This required strength and backbone - I can only congratulate the left-pad author on his decision, knowing not many people would take appropriate action.