Intro

This post explains why npm, Yarn and pnpm were created and the other major problems they’ve solved over time.

If necessary, you should read my glossary for this post. It defines the terms package, module, library, dependency, sub-dependency, bare specifier, dependency graph, package registry and package manager.

The basic reasons package managers are useful is they automate a lot of work from installing sub-dependencies to configuring your dependency tree. If you need to more thoroughly understand these benefits, I’d recommend reading about my attempt to start a project without a package manager on the back-end. I also wrote an explanation of why npm, yarn or pnpm are commonly used on the front-end.

Before diving into the package managers, it’s also worth understanding how node is able to find a package you’re trying to import. To do so, I’d recommend reading my explanation of node’s module resolution system.

Npm

Npm, released on January 12, 2010, was the first package registry and package manager for node.

On May 01, 2011, npm version 1 was released, which made local package installation work consistently. This change was a huge step forward because it was a pain to get things to work with global packages. If you install a package globally, you’ll notice you can’t just import a package by using a bare specifier.

You could configure npm to install global packages to a different folder where node’s module resolution system would find your packages, but then your app would have access to all your packages and you could accidentally use a global package that you forgot to put in your package.json. That would cause your app to break when someone else tried to use your app. I could envision myself making this mistake if I had to use npm in 2010 because global installations don’t automatically add the package you install to your package.json.

Additionally, by default npm only stores one copy of a global package. For example, if you ran npm install express and then ran npm install express@4.16.4, version 4.16.4 of express would overwrite the latest version of express. This would make it difficult to use different versions of global packages in different projects.

In version 1, npm implemented a nested dependency structure. This means that you’d find the packages you locally installed in your root node_modules folder and all of your sub-dependencies would be stored in the node_modules folder of your dependencies.

The two versions of B are installed in different directories avoiding dependency hell. https://npm.github.io/how-npm-works-docs/npm2/how-npm2-works.html

For example, imagine your app has package A and package C in your package.json, package A has the dependency package B version 1.0 and package C has the dependency package B version 2.0. Package A and C would be in your root node_modules. Package B version 1.0 would be within the node_modules folder of package A and package B version 2.0 would be in the node_modules folder of package C.

This approach solved the problem of “dependency hell.” Dependency hell occurs if you try to install two versions of a package within the same folder, which would break your app. Since dependencies were stored within the node_modules folder of their dependency in npm, dependency hell would never occur when you installed a package.

However, this nested dependency structure led to long file paths since one dependency could have a sub-dependency, which had its own dependencies etc. This caused apps to break when using Windows. Windows defaults to a 260 character limit on the size of file paths and this limit couldn’t be changed before Windows 10.

Npm version 3 “flattened” the dependency tree to fix this problem. This meant that all dependencies and sub-dependencies would be placed in the root node_modules folder by default. If a version of the package was already in the root node_modules folder, to avoid dependency hell it would be placed in the node_modules folder of the dependency that used it like in npm v2. Therefore, the dependency tree wasn’t completely flat, but it was flat enough that the file path problem for Windows users greatly decreased.

Once again imagine your app has package A and package C in your package.json. Package A has the dependency package B version 1.0 and package C has the dependency package B version 2.0. When package A is installed package B version 1.0 is added to your root node_modules folder. When package C is installed package B version 2.0 can’t be added to the root node_modules folder since package B version 1.0 is already there.

Flattening the dependency tree also helped save disk space and speed up installation times. If you have the app above and tried to install package D version 1.0, which has the sub-dependency of package B version 1.0 then you wouldn’t need to install package B version 1.0 again like you would in npm version 2.

With the emergence of module bundlers such as browserify and webpack, it became easier to use npm on the front-end too. At the time developers could’ve already used the Bower package manager on the client side. However, Bower forces users to only install one version of a package per app, which means developers must manually solve dependency hell.

Additionally, at the time browsers didn’t support ES modules, aka JavaScript modules, and you still can’t use bare specifiers in the browser. This means Bower users must type out package paths and worry about global scope pollution. I provide an example of this problem in this repo.

A minority of users preferred Bower to npm because they preferred to manually solve dependency hell to have a completely flat node_modules structure. But, even Bower’s maintainers recommended using Yarn once it was released on October 11, 2016. They seem to have preferred yarn to npm because it provides an option to completely flatten your node_modules.

Yarn

Yarn quickly gained traction because it was backed by Facebook and Google and installing packages was much faster with Yarn than npm. One reason Yarn’s installs were faster than npm’s was because it used a faster algorithm to get data from its cache.

Benchmarks comparing npm and yarn shortly after yarn was released https://www.berriart.com/blog/2016/10/npm-yarn-benchmark/

Yarn also initially provided a few other benefits over npm.

Yarn also used its cache so you can use any previously downloaded package offline. This feature was important to Facebook and presumably other corporations. That’s because Facebook wanted to cut their continuous integration environments off from the internet so they’re much harder to hack.

Using offline copies of packages also speeds up continuous integration build times and regular package installation since you don’t need to make a network request to get your packages. And if you’re not making network requests, you don’t need to worry about a network request for a package failing, which makes builds more reliable. For example, storing packages offline would’ve allowed companies to avoid problems when the popular npm package left-pad was removed from the npm registry.

Yarn also improved app security by generating a checksum (aka hash) from the contents of each package. In short, this was done by using a hashing function, which was a pure function. That means that every time you input the same data into a function you receive the same output. So if even one character in your package has changed, Yarn’s hashing function would recognize something went wrong. This ensures that you don’t need to worry about a hacker changing the content of the package on the npm registry or modifying the package while you’re installing it.

Yarn also received a lot of attention for its “lockfile”. Upon the first installation a package, a file titled yarn.lock is created, which lists the exact versions of each package that’s installed. This file is updated by Yarn each time packages are installed and updated.

A yarn.lock file

When installing packages with npm, packages in the package.json are pre-fixed with a caret by default, which means to install the latest minor version of a package. This means that developer A could clone an app that uses React at 1pm and everything works, but then React could be updated at 1:30pm and the update could accidentally contain a bug. Therefore, when developer B installs the same app at 2pm the new version of React is installed and the app doesn’t work.

By default, Yarn installs the package versions listed in the yarn.lock file rather than install the latest semver version based on the package.json. This prevents developers from spending time debugging why an app works on one computer, but is broken on another computer.

Npm actually already had its own version of a lockfile, that it named the shrinkwrap file, which would’ve fixed that issue. But, many developers weren’t aware of the shrinkwrap file because it wasn’t enabled by default.

Npm largely caught up with Yarn when npm version 5 was released on May 25, 2017. This release optimized npm’s cache to improve install speeds to be approximately as fast as Yarn. Additionally, npm enabled its version of a lock file by default, which it renamed to package-lock. You can read about the slight differences between yarn.lock and package-lock.json here. Npm also started using checksums and letting users install packages offline from their local cache too.

Pnpm

Pnpm emerged into the picture in June 2017 when it released version 1 and its creator Zoltan Kochan blogged about it. In his post, “Why should we use pnpm” Kochan explained how pnpm allows users to save disk space and avoid the problems of npm and Yarn’s node_modules structure.

At the time, when npm and Yarn installed packages they would install at least one copy of each package per project, even if the package was already in its cache. Pnpm only installs a package once to its cache, which it refers to as its store. If you install pnpm, you can find the store in the .pnpm-store directory within your home folder.

.pnpm-store is at the bottom of this picture

Instead of installing a copy of a package each time you use it in a project, pnpm creates hard links to all the files in the package from your pnpm store. A hardlink is essentially a copy except if you modify the original file or the hardlinked file you’ll change both files at once. While a traditional copy takes up as much space as the original file, a hard link only adds a small amount of space to account for naming the hard-linked file.

This means that you will use almost zero additional hard drive space if you use a version of a package more than once. For example, even if you use react in 2 or more pnpm projects, react was only installed by pnpm once.

These savings compound. The popular starter kit create-react-app, which uses Yarn by default, comes with a node_modules folder of 212 megabytes. So if I had 10 more create-react-app projects on my computer I’d save 2.12 gigabytes plus more space for any additional libraries used more than once.

In addition to saving disk space, you’ll save a lot of time you would’ve spent installing those extra copies of packages.

Because files are hardlinked you need to be careful about modifying the contents of your node_modules. If you change any of their content you’ll also temporarily change that package on all the other pnpm projects on your computer.

To mitigate that issue, Pnpm keeps a checksum of the original package, which allows it to recognize when you’ve changed its contents. It will automatically revert your changes to a package the next time you try to install it.

Kochan also disliked how the flat node_modules structure used in npm since version 3 and Yarn allowed a user to require sub-dependencies in the root node_modules folder in their application code. In his blog post, pnpm’s strictness helps to avoid silly bugs, Kochan explains this could cause problems if a user forgot to install a package, but it was already installed as a sub-dependency. This means that your project could work for now, but it would break if that sub-dependency was removed as a dependency of your dependency. Your project could also break if your dependency began to use a new version of that sub-dependency.

Kochan’s logic is correct, but npm and Yarn are so widely used that many developers have already created projects, which imported packages that weren’t in their package.json. For example, this was a problem with create-react-native-app. That meant you could type something like npm install create-react-native-app, or use Yarn, and it would work, but it didn’t work with pnpm.

In the case of create-react-native-app, it used the dependency metro-bundler, which forgot to add a number of dependencies to its package.json. Pnpm users would have to find the dependencies that the library developer forgot to add to their package.json and use pnpm hooks to install those dependencies. To fix the problem for others in the future, they could submit a pull request to fix the third party library.

The pnpm hook used so create-react-native-app could work with pnpm

Kochan recognized that npm and Yarn were much more popular than pnpm and that most users wanted their packages to work as quickly as possible. Kochan made a compromise in pnpm version 4, which hoists your sub-dependencies into a node_modules folder where your dependencies can access them.

This means that you shouldn’t have any issues when installing other packages that work with npm and Yarn. In case you do, Pnpm does provide an option, shamefully-hoist, to use a flat node_modules structure like npm and Yarn.

Kochan remained strict by not allowing your app to access its own sub-dependencies. This means that no new pnpm project will have the issues that metro-bundler had, but you don’t need to fix existing packages to get pnpm to work consistently.

You can also use npm to install packages with the — global-style flag to create a node_modules structure that only lets you import your direct dependencies.

An example pnpm project. React is the only package I installed

As you can see in the picture above, pnpm’s node_modules structure is complex, but it works. Pnpm uses symlinks (or junctions on Windows) to build this structure, which node follows to find the location of packages. The node_modules folder within the .pnpm folder is the location where the hard linked files of your sub-dependencies are stored.

React, the only direct dependency installed in my example, is listed directly within the node_modules folder. That folder is symlinked to the folder node_modules/.pnpm/registry.npmjs.org/react/16.12.0/node_modules/react where the actual package is contained. Node follows the symlink and at 68 characters, the file path is short enough that pnpm easily avoids hitting Windows’s 260 character limit.

One remaining drawback of pnpm, is that because it uses symlinks it doesn’t work with some file watching tools such as Watchman. That’s one reason Yarn abandoned an original plan to use symlinks. It also won’t work with FAT32 file systems, which don’t support hard links or symlinks. That should only be a problem if you want to store your project on a flash drive or SD card.

If you’re interested in learning more about hard links, symlinks and how pnpm uses them I’d check out the following resources.

Workspaces

In August 2017, the Yarn team released Yarn workspaces, a feature that makes it easier to bootstrap and manage monorepos.

A monorepo is a repository that contains multiple projects. Monorepo’s are often used both to store the private code of large companies such as Google and Facebook and for open-source projects such as React and Babel. Many developers prefer the monorepo structure rather than creating a separate repository for each project. That’s because they feel monorepo’s make it easier to update versions and refactor code without creating a new repository.

Yarn helps developers manage monorepos by letting them install packages concurrently across all the projects in their monorepo. The yarn workspace commands also make it easier for developers to manage individual projects from the root directory.

An example directory structure and root package.json of an app using yarn workspaces.

The tool Lerna also has all the features of Yarn workspaces and more. The Yarn team recommends to use workspaces because it installs packages faster and is more stable. They added that Lerna can’t do this as well because its a wrapper around a package manager and not a package manager itself. Lerna can be used with Yarn workspaces and is still useful if you want to use its features to make it easier to publish packages to the npm registry.

Pnpm has also implemented its own version of workspaces and npm will also add a workspaces feature in its next major release. Npm also seems to have plans to add Lerna’s features to publish packages in a future release.

The next major version of Yarn will allow nested workspaces. As the name implies, this means you’ll be able to create a workspace within a workspace.

PnP

In September 2018, the Yarn team introduced Yarn Plug’n’Play (Yarn PnP). Yarn PnP solves the same problems of a flat node_modules structure and duplicate package installations that pnpm fixed in a different way. They chose to use a different approach to make package installations faster and to eventually eliminate the need for installations on continuous integration builds.

The Yarn PnP documentation states that hard links eliminate package duplication, but using them requires Node to take the time to make a bunch of calls to your operating system. They go into further detail on the flaws of using symlinks and hard links on page 2 of the Yarn PnP white paper.

With Yarn PnP, when your packages are installed yarn creates a file named .pnp.js instead of a node_modules folder. The .pnp.js file lists your packages, their dependencies and a relative path to their location on your hard drive.

Information about react in my .pnp.js file

Yarn PnP modifies Node’s default module resolution to essentially say instead of looking for node_modules directories in a file’s parent folders look at the .pnp.js file. If it finds that version of the package there, it’ll use the packageLocation field to take you to the appropriate location on your disk.

When using Yarn PnP, you may also notice a folder titled .pnp, which will be renamed to .yarn when Yarn version 2 is released. This stores packages with postinstall scripts, scripts that run right after the package is installed. That’s because the contents of a package with a postinstall script, like node-sass, could vary depending on your node version. This means that package has to be installed separately for each project because you could be using different node versions for each project.

Since Yarn is put in charge of resolving dependencies rather than node in Yarn PnP, Yarn can recognize whether you’re importing a dependency that is listed as a devDependency rather than a dependency in your package.json.

This solves one more problem of the flat node_modules structure that pnpm couldn’t directly solve because it relied on node’s default module resolution system. Without using Yarn PnP, you would have had to solve this problem through an external package, such as package-preview by Zoltan Kochan, to catch when you’re importing devDependencies while running your tests.

While Yarn doesn’t enable PnP by default, you can set it up by adding a few lines of code. But, Yarn PnP currently does not make the change pnpm version 4 did to allow dependencies to access any sub-dependency. This means that if you’re using a package that forgot to add a dependency your app will break when using Yarn PnP, while it works with other package managers for now.

Like many new technologies, it’ll also take time before Yarn PnP becomes stable. Facebook recently disabled it because it wasn’t compatible with enough of their packages.

Yarn PnP is a specification rather than a feature exclusive to Yarn so it could be adopted by other package managers in the future.

The Future

The Yarn maintainers have released their plans for version 2. These include optimizing Yarn so you’ll no longer need to run Yarn install after you clone a repository and making it easier for Yarn to be used with other languages besides JavaScript. Yarn PnP will also be enabled by default.

Npm has announced plans to make major changes in version 8, which they’ve named tink. Tink would work by overwriting the file system (fs) node core module, to load your packages at runtime into a cache that’s shared across all of your projects.

This leads to a few benefits for users. You won’t have to ever type npm install again, even the first time you use a package in a new project. Also, in some cases npm will be able to more intelligently only download the parts of a package that you need. Npm’s file system module replacement will also be able to recognize TypeScript and JSX files without any additional configuration.

While packages are installed at runtime, you’ll be able to run a command, npm prepare, so they’re all loaded before your app is in production.

Since npm hasn’t released version 7 yet, I’d expect it to be a while before tink is officially released. If you can’t wait, you can try out the in development version of tink, but npm warns not to use it in production.

Yarn PnP and npm tink come with different tradeoffs. Yarn maintainer Maël Nison finds overwriting node core modules to be risky because its a big enough change that there will probably be bugs that could make your app less secure and reliable. Npm has acknowledged this risk, but points out that Electron has already used this approach successfully.

Npm also prefers their approach because Yarn PnP requires additional configuration to work with certain tools such as webpack, Jest and TypeScript. Those tools have worked to incorporate Yarn PnP (jest, TypeScript) and webpack 5 will support Yarn PnP by default so this presumably won’t be a long-term problem.

I hope you enjoyed this post! Feel free to let me know what you think in the comments.