Why Node?

Node has been around now for about ten years, and is core to modern front-end development. Node’s package manager (NPM) has more than 350,000 libraries available, making it twice as big as the next largest package repository.

NPM helps programmers share their Node libraries and simplifies dependencies with an API, and enables developers to compile and bundle front-end assets independent of their back-end stack. In addition to these reasons, using Node provided us with two further advantages:

1. According to StackOverflow’s 2017 developer survey, 72% of developers use JavaScript, and at least half of them use Node. More developers are picking specialization in Node over other languages. If Adyen isn’t using Node, we are alienating a huge pool of developers and missing out on great hires. We need to embrace Node to keep up with our rapid growth.

2. Our current goal in front-end is to separate the front-end and back-end. Decoupling avoids rendering a page with every request, reducing server resources. It allows us to develop interfaces and services in parallel which enables quick prototyping of new features.

Node is critical for our move to a modern, mainstream, front-end framework like VueJS.

The problem: NPM is insecure

However, for all its advantages, few developer tools attract as much controversy as Node. Developers discuss — and even fume at each other — about the benefits and drawbacks of this powerful technology. What is it in Node that creates such vitriol, and why does it seem that we can’t discuss it without sparking a flame war?

Complaints about Node vary from criticism of how it is written, such as error handling, to the way code is executed, such as CPU inefficiency. While these topics are debatable, most developers will agree that the Node Package Manager (NPM) has high potential to be a security risk.

Ryan Dahl, the creator of Node, has regrets:

“Unfortunately in Node we just bound to everything, and there’s zero security. You run a node program and you have access to all kinds of system calls.”

On a minor level, there is no enforced uniformity in the package.json. There are placeholders for license, a link to source code, and so on, but lack of enforcement makes it very difficult to automate validation. Another issue is that the source code is not visible to the user from the NPM website. Again, there is a placeholder in package.json to link to the source code, but that code does not have to match the code hosted by Node.

NPM also has a feature called “lifecycle scripts” named preinstall and postinstall. These are scripts that run before and after package installation. They handle installation of prerequisites and cleaning up any mess left behind. These scripts have the power to invoke the shell, and could potentially install anything.

To complicate things further, these concerns are multiplied by the number of the package’s dependencies. The top-level package might be fine, but what about the dependencies of that package or the dependencies of those dependencies? Nested dependencies containing nested dependencies are like sinister Matryoshka dolls. Any of these might contain a nasty surprise.

NPM: Nested dependencies like sinister Matryoshka dolls 👻

The original left-pad package is a great example of a package that seems safe, but could be malicious. Left-pad is a utility package that pads out the left-hand side of strings; an unremarkable dependency used by thousands of projects. It was unpublished by its author and, as a result, broke thousands of projects relying on it. As a fix, NPM updated their policy on unpublishing packages, and took measures to prevent squatting on package names. Still, this edge case highlights the problems with NPM. No one really knows what happens deep in the dependency tree.

No one really knows what happens deep in the dependency tree.

Finding a solution

Our security team approves every piece of software before allowing its use. When any of the approved software updates, they must approve it again. Vetting every dependency in the tree is impossible, but we had to take security measures to protect Adyen. So we looked for a tool that could report on a library’s security weaknesses.

Automated dependency verification became a prerequisite for using Node at Adyen. At first, research led to some other tools available on the market which made bold claims about being able to “run all code safely.” Their formulas were not very clear, and they were unknown entities with smaller repositories. Most importantly, they were not open-source solutions, and we wanted a tool where we have control over the code.

It became clear that securing Node at Adyen would be a DIY project. We assembled a small team and began building Skantek, a private node package and supporting infrastructure.

How Skantek works

Skantek is a Node program to scan for suspect packages, using metrics such as:

Presence, location and length of the readme file.

Presence of a link to source code in GitHub.

Appearance in Snyk, or other JavaScript vulnerability databases.

Presence of a license.

Based on these factors, Skantek determines the risk level of a package, and whether a manual review is necessary. If a package is approved, it adds it to a private NPM registry.

How it determines risk

Skantek fetches the package metadata from NPM and uses a library to resolve all dependencies in the tree. It traverses through the tree and scans each package, assigning a risk score. If these all pass, it scans the parent package. It then retrieves all associated packages and publishes these to the private registry.