Shrink node_modules with refining

Alternative universe title: Oops, I shrunk node_modules!

It's easy to end up with a bloated node_modules directory when developing a Node.js application. You can try to trim your dependencies or be clever about them with caching or symlinks, but that will only get you so far. What if you could shrink the modules themselves?

In this article I introduce refining, a technique to shrink node_modules by deleting unnecessary files within the module itself. It's an experimental technique. You still have to download your full node_modules but you'll be able to shrink it before packaging it into an artifact or Docker image.

I also present two proof-of-concept experiments (with Express and React) using refining, as well as a discussion on limitations and applicability to a production build pipeline.

Why should I care about node_modules weight?

The first victim of a bloated node_modules is your developer feedback loop – unless your build system can cache dependencies your feedback loop just gets longer and longer. Developers should not have to wait more than 10 minutes for build feedback. This is terrible for your DevOps journey.

Your entire pipeline also suffers: you get big Docker images, overloaded image registry servers, and slow cold-starts in a container cluster.

This meme is instantly relatable to anyone that tries to maintain a healthy delivery pipeline:

What is refining?

You don't need every file in node_modules ! It's filled with fluff that shouldn't have been published, such as documentation, build files, and test code. It may even have unused feature code – for example you might only use part of a modularized package. If you can detect exactly which files your application needs then the rest can be deleted safely.

Refining is a method to separate a useful substance from impurities. In this case the useful substance is a set of required node_modules files and the impurities are the remaining files which can be safely deleted.

Refining is a black-box approach – instead of analyzing the code (like webpack) it uses the operating system to list all the files your application opens. Therefore the inverse of these files could be deleted safely to refine your node_modules .

This technique requires running the application first without any modifications to collect data on its behaviour. Consider the V8 optimizing compiler as an analogy: V8 runs code first with a byte-code compiler (Ignition) while it collects data on code behaviour and hotspots. After analysis V8 switches some of the code to its optimizing machine-code compiler (Turbofan). For safety V8 can also fall back from Turbofan to Ignition if needed. A full application of refining might work in a similar way.

How to refine node_modules

I followed these steps to refine applications:

Start file access tracing Run application and tests Stop file access tracing Delete unused node_modules files Rerun application and tests

File access tracing

I used opensnoop on macOS to trace file access by all node processes. On Linux you can use strace instead. I extracted only filepaths within node_modules (to exclude system files that Node.js reads on startup), and I printed them relative to my base directory for easy diffing later:

sudo opensnoop -n node \ | grep --color=never " $(pwd) /*node_modules" \ | awk '{print $5}' \ | sed -e "s:^ $(pwd) ::" \ | xargs -L 1 -I %e echo ".%e" \ | tee nm-trace.log

opensnoop stays running until stopped. My test script killed opensnoop after running my application and its tests.

Deleting unused files

I deleted the unused files with:

diff \ --new-line-format= "" \ --unchanged-line-format= "" \ <(find . -path "*/node_modules/*" - type f | sort) \ <(sort --unique nm-trace.log) \ | xargs -L 1 rm

I sorted the diff inputs to properly compare changed lines.

Disk space measurement

I used du on macOS to measure disk space. Any sizes reported below are disk space measurements, not apparent size. That's why all measurements are in multiples of 4 kb.

Experiment 1: a simple Express server

Refining a simple Express server reduced disk usage by 59% (from 2.4 mb to 984 kb) while keeping the server working:

Raw Refined Reduction Disk usage 2.4 mb 984 kb 59% Number of files 314 123 61%

Here were the top 3 packages in terms of size reduction:

Package Raw size (kb) Refined size (kb) Reduction (kb) iconv-lite 400 32 368 qs 164 36 128 express 240 120 120

What got deleted from these modules:

iconv-lite is used for parsing HTTP bodies. The bulk of the 92% size reduction was due to encoding files that are lazy loaded if needed. The remaining reduction came from deleting non-code files and a typings file.

qs is used for parsing URL query parameters. The bulk of the 78% size reduction was due to a browser bundle in dist/ and test code. The remaining reduction came from deleting non-code files.

express was the sole direct dependency in this experiment and provides a web server framework. The majority of the 50% size reduction comes from deleting a 108 kb History.md document.

This experiment shows that refining is not just about deleting Markdown documents inside a module directory. In addition to deleting documentation refining safely deleted JavaScript files that weren't needed for the application: a qs browser bundle, qs browser code, and iconv-lite optional encodings.

See Appendix 1 for setup details and a list of files that were kept or deleted by refining.

Experiment 2: React with server-side rendering

Refining a simple React application reduced disk usage by 94% (from 4.9 mb to 284 kb) while keeping the application working:

Raw Refined Reduction Disk usage 4.9 mb 284 kb 94% Number of files 103 10 90%

Here were the top 3 packages in terms of size reduction:

Package Raw size (kb) Refined size (kb) Reduction (kb) react-dom 4500 132 4368 react 220 72 148 scheduler 104 0 104

What got deleted from these modules:

react-dom is used to render React applications onto an HTML document object model (DOM) and is a direct dependency. The 97% size reduction was due to mostly deleting all browser (UMD) bundles as well as deleting any CommonJS code in cjs/ needed for client-side rendering. Any code needed for server-side rendering was kept:

react is the main direct dependency and is used for creating the view layer of an application. The bulk of the 67% size reduction came from deleting all browser (UMD) bundles. The CommonJS production code was also deleted as this experiment was run in development mode. All essential code was kept:

scheduler is used by React internally for supporting browser environments. The package was deleted entirely as the application only uses server-side rendering.

This experiment has a much larger reduction than the Express experiment, likely because server-side rendering only uses a fraction of the React code. Only 10 dependency files were actually needed to run this application.

This experiment shows the potential of refining to trim packages based on the features used. Since the experiment focused on server-size rendering, refining deleted all browser bundles and even CommonJS code to support client-side rendering.

npm packages meant for both browsers and Node.js contain bundles to target both environments. However the application artifact only needs to have the Node.js-related code. Refining helped delete irrelevant code safely.

See Appendix 2 for setup details and a list of files that were kept or deleted by refining.

Limitations

Refining needs a complete list of all node_modules files your application needs. The application start and testing steps should ensure that Node.js loads all the node_modules files it needs in production.

Of course this would also catch test framework code. To avoid this you would have to pull your test framework out of your node_modules and supply it from another location (maybe a higher-level folder).

Ideally an application would require all packages up front however some applications may lazy load other modules. Your dependencies might also lazy load modules (though I would personally consider that bad behaviour from a library package).

We need a generalized way to quickly determine if an application has finished file access for node_modules . Maybe if you monitor file access in production on your canary for a while, then you can build a reliable profile for later refining and release to broader traffic.

Once you have a complete list you could check it into your codebase as a snapshot, similar to React snapshot testing. Then you would update it periodically as your dependency usage changes.

Conclusion

Refining has potential and works well for toy applications. The next step is to try it on larger and more complex applications.

Refining solves a problem that ideally shouldn't exist but unfortunately does. Even battle-tested and stable packages such as express have unnecessary files. Package maintainers could reduce size by actively including files to publish through the files keyword in package.json . This would prevent files such as .travis.yml from being published. However it wouldn't be feasible to coordinate an effort to trim even the most popular npm packages. I also can't see package maintainers going out of their way to exclude files to trim their modules because they don't pay the cost themselves.

Refining isn't code-intelligent. It's just keeping track of what files are accessed. It can't eliminate dead code within a file or perform tree-shaking. As long as you load a node_modules file from your application entrypoint then refining will keep it. This works well for modules like react-dom which has multiple entrypoints depending on the usage mode (e.g. react-dom/server ). If the usage mode is selected through a parameter then likely all files would be retained, even the useless ones.

Packages already exist to delete node_modules files based on pattern matching (such as deleting all Markdown documents), for example Modclean. Refining goes further and deletes actual code files. It is agnostic to the type of a file. It only cares whether it is required or not.

This article gave some ideas about how to apply it to a build pipeline (by storing snapshots) but doesn't have a full answer yet.

Appendix 1: detailed results from Experiment 1 (simple Express server)

Setup

I created a simple Express server application. The only direct dependency was express . This was the entire application:

const express = require ( 'express' ); const app = express(); app.use(express.json()); app.get( '/' , (req, res) => res.send( 'Hello world!' )); app.listen( 8080 , () => console .log( 'Listening on port 8080.' ));

I tested the application with an HTTP request:

curl http://localhost:8080/

Files deleted or kept

Here are the files that were either kept or deleted by refining from the top 3 packages in terms of size reduction:

iconv-lite

File Size (kb) Status lib/bom-handling.js 4 Keep lib/extend-node.js 12 Keep lib/index.js 8 Keep lib/streams.js 4 Keep package.json 4 Keep .travis.yml 4 Delete Changelog.md 8 Delete LICENSE 4 Delete README.md 8 Delete encodings/dbcs-codec.js 24 Delete encodings/dbcs-data.js 12 Delete encodings/index.js 4 Delete encodings/internal.js 8 Delete encodings/sbcs-codec.js 4 Delete encodings/sbcs-data-generated.js 32 Delete encodings/sbcs-data.js 8 Delete encodings/tables/big5-added.json 20 Delete encodings/tables/cp936.json 48 Delete encodings/tables/cp949.json 40 Delete encodings/tables/cp950.json 44 Delete encodings/tables/eucjp.json 44 Delete encodings/tables/gb18030-ranges.json 4 Delete encodings/tables/gbk-added.json 4 Delete encodings/tables/shiftjis.json 24 Delete encodings/utf16.js 8 Delete encodings/utf7.js 12 Delete lib/index.d.ts 4 Delete

qs

File Size (kb) Status lib/formats.js 4 Keep lib/index.js 4 Keep lib/parse.js 8 Keep lib/stringify.js 8 Keep lib/utils.js 8 Keep package.json 4 Keep .editorconfig 4 Delete .eslintignore 4 Delete .eslintrc 4 Delete CHANGELOG.md 16 Delete LICENSE 4 Delete README.md 16 Delete dist/qs.js 20 Delete test/.eslintrc 4 Delete test/index.js 4 Delete test/parse.js 24 Delete test/stringify.js 24 Delete test/utils.js 4 Delete

express

File Size (kb) Status index.js 4 Keep lib/application.js 16 Keep lib/express.js 4 Keep lib/middleware/init.js 4 Keep lib/middleware/query.js 4 Keep lib/request.js 16 Keep lib/response.js 28 Keep lib/router/index.js 16 Keep lib/router/layer.js 4 Keep lib/router/route.js 8 Keep lib/utils.js 8 Keep lib/view.js 4 Keep package.json 4 Keep History.md 108 Delete LICENSE 4 Delete Readme.md 8 Delete

Appendix 2: detailed results from Experiment 2 (React with server-side rendering)

Setup

I created a simple React application with server-side rendering. The only direct dependencies were react and react-dom . This was the entire application:

const React = require ( 'react' ); const ReactDOMServer = require ( 'react-dom/server' ); class Hello extends React . Component { render() { return React.createElement( 'div' , null , `Hello ${ this .props.toWhat} ` ); } } const html = ReactDOMServer.renderToString( React.createElement(Hello, { toWhat : 'World' }, null ) ); console .log(html);

Files deleted or kept

Here are the files that were either kept or deleted by refining from the top 3 packages in terms of size reduction:

react-dom

File Size (kb) Status cjs/react-dom-server.node.development.js 124 Keep server.js 4 Keep server.node.js 4 Keep LICENSE 4 Delete README.md 4 Delete build-info.json 4 Delete cjs/react-dom-server.browser.development.js 120 Delete cjs/react-dom-server.browser.production.min.js 20 Delete cjs/react-dom-server.node.production.min.js 20 Delete cjs/react-dom-test-utils.development.js 48 Delete cjs/react-dom-test-utils.production.min.js 12 Delete cjs/react-dom-unstable-fire.development.js 724 Delete cjs/react-dom-unstable-fire.production.min.js 100 Delete cjs/react-dom-unstable-fire.profiling.min.js 104 Delete cjs/react-dom-unstable-fizz.browser.development.js 4 Delete cjs/react-dom-unstable-fizz.browser.production.min.js 4 Delete cjs/react-dom-unstable-fizz.node.development.js 4 Delete cjs/react-dom-unstable-fizz.node.production.min.js 4 Delete cjs/react-dom-unstable-native-dependencies.development.js 64 Delete cjs/react-dom-unstable-native-dependencies.production.min.js 12 Delete cjs/react-dom.development.js 724 Delete cjs/react-dom.production.min.js 100 Delete cjs/react-dom.profiling.min.js 104 Delete index.js 4 Delete package.json 4 Delete profiling.js 4 Delete server.browser.js 4 Delete test-utils.js 4 Delete umd/react-dom-server.browser.development.js 124 Delete umd/react-dom-server.browser.production.min.js 20 Delete umd/react-dom-test-utils.development.js 48 Delete umd/react-dom-test-utils.production.min.js 12 Delete umd/react-dom-unstable-fire.development.js 728 Delete umd/react-dom-unstable-fire.production.min.js 100 Delete umd/react-dom-unstable-fire.profiling.min.js 104 Delete umd/react-dom-unstable-fizz.browser.development.js 4 Delete umd/react-dom-unstable-fizz.browser.production.min.js 4 Delete umd/react-dom-unstable-native-dependencies.development.js 64 Delete umd/react-dom-unstable-native-dependencies.production.min.js 12 Delete umd/react-dom.development.js 728 Delete umd/react-dom.production.min.js 100 Delete umd/react-dom.profiling.min.js 104 Delete unstable-fizz.browser.js 4 Delete unstable-fizz.js 4 Delete unstable-fizz.node.js 4 Delete unstable-native-dependencies.js 4 Delete

react

File Size (kb) Status cjs/react.development.js 64 Keep index.js 4 Keep package.json 4 Keep LICENSE 4 Delete README.md 4 Delete build-info.json 4 Delete cjs/react.production.min.js 8 Delete umd/react.development.js 100 Delete umd/react.production.min.js 12 Delete umd/react.profiling.min.js 16 Delete

scheduler

Refining deleted this package entirely.

Discuss this on: Hacker News or Reddit (/r/javascript) (/r/node).