Node.js research

Introduction

Hi, I'm Ryan Wilcox. I've been programming for about 15 years on various things, and been around the block a few times. I've done classic Mac OS applications, cross-platform applications in C++, Python web apps of all sorts (including some Twisted), declarative programming, and I've spent the last 3 years doing Ruby and Ruby on Rails (including some Event Machine).

I know the best practices of these frameworks, the pitfalls, and the "why" of those best practices.

So, when evaluating node.js for use in a potential project, I asked around for the node.js best practices. I didn't get as much discussion as I was hoping for.

I decided to dig in and look at node.js from the eyes of someone who's Been There, Done That.

I am a node.js outsider. This potential project will be my first node.js project of any size. I don't speak from direct experience with node.js, but from my research and my own knowledge dealing with other event systems. I'd love to know if I'm wrong on a topic.

Audience

This document assumes you have tried out node.js a little. Installed it on your machine, ran the Hello World HTTP server. You saw the "don't block, use callbacks" philosophy of node, and thought, "OK, I can do this"

Then maybe you put node.js down because you didn't need a asynchronous Hello World server in your mostly Ruby (or Python Clojure, or Bog knows what else) shop. This is exactly what I did. Until the other day.

This document assumes you've been around the block a few times, and are looking at node.js with an evaluating eye. "Can I use this for a new potential project, and what are the best practices in the community?"

Yes, the Node.js Modules page (https://github.com/joyent/node/wiki/modules) is there. It's also 35 pages long - a great show of what node.js can do… you see all the practices, but not which ones are the best .

"But seriously, I just need to write some code, not check out 100+ node.js projects that may or may not still work or be any good. And really, that 'node.js is cancer' rant was a big deal a while back, WTF's up with that? And then there were those non-blocking Fibonacci servers…."

Node.js is cancer: http://teddziuba.com/2011/10/node-js-is-cancer.html

Node.js has jumped the shark: http://www.unlimitednovelty.com/2011/10/nodejs-has-jumped-shark.html

node.js non-blocking Fibonacci code: https://github.com/glenjamin/node-fib/blob/master/app.js

Still here? Good - keep reading, because you're my audience.

Or, did that kind of go over your head? Say "Wait, huh, what?!" - This article has a fair bit of reference material, so take a look at the reference material and come back. This research was pretty frustrating, even to me, to gather.

The node.js Event Model

cooperative multi-threading : process.nextTick() lets you defer stuff to the next time the event loop is idle. So you can let other things have a chunk of time if you're in the middle of a long, blocking operation

http://nodejs.org/docs/v0.3.1/api/process.html#process.nextTick

http://en.wikipedia.org/wiki/Thread_(computer_science)#Multithreading

Node.js (highly) encourages a non-blocking style of programming. Thus all the callbacks: "do this at some time after this other thing happens".

Wait, what does "non-blocking" mean?

Yielding to process.nextTick in lengthy operations

using callbacks when performing low level operations

using small blocking operations to build up larger sequences of events that happen asynchronously

for example:

for (current_record in records) {

record.updateTimeRemainingAsynchronously(function() {;})

}

the FOR loop in this case is blocking, but it's spawning N record.updateTimeRemainingAsynchronously functions to run sometime in the future. (One just has to hope/know that record.updateTimeRemainingAsynchronously() is actually non-blocking

Using the event loop to split things up, or shoving it to workers

So, blocking is bad, right?

Right

"in node, everything runs in parallel except your code"

Technically the common quote is wrong, and could be refined: "Everything runs, one thing at a time, in the event loop, where everyone tries to be polite and give others time to do their thing. Your code should also be polite and give others time to do their thing" It's more accurate, and more unwieldy. Maybe: "We are nice because node is nice", is a better (if obfuscated) quote.

But why ? You have one event loop per node.js process. If your code hogs the processor (event loop) for 5 seconds (by sleeping, or doing a large calculations, or Fibonacci numbers) node.js will not respond to anything else for those 5 seconds . " What other things? ", you ask. Things like responding to other HTTP requests. The longer your code blocks, the more you are Denial Of Service Attacking yourself.

http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/

http://www.slideshare.net/shivercube/functional-nodejs

So how do you make your code not blocking where you can?

Send really long stuff to the background: put it in a queue and progress it in the background. This is Rails 301 stuff

Use Events to split things up into events that can be listened to (and sending an event means implicitly that the event loop will run)

use the profiler (?)

Be polite and use process.nextTick places, but don't overdue it. (the more time you give away to Other Things the more wall-clock time the request will take.

Possible Frameworks to use:

Application Frameworks

http://expressjs.com/ <- it's Sinatra-like

Personally, I like Sinatra when I *know* I'm going into a very small project that will not grow a ton of features (thus resulting in a "big ball of mud" app). Elsewise, I like frameworks that assume a big structure the project can grow into.

http://geddyjs.org/ <-- Rails like infrastructure,

and offers geddy.util.async.execNonBlocking function to perform things in a non-blocking manner

geddy.util.async.AsyncChain is a chain of functors that will be executed asyncronously

ORM is DataMapper like

supports generation, but not coffeescript generation

http://railwayjs.com/ <-- another Rails like infrastructure framework

Rails like

Its generators can output coffeescript (pass --coffee to the rw commands)

ORMs: mysql, mongoid, redis, mysql

can also use Sequalize (a Datamapper based ORM) <-- BUT mysql only

ORM that Railway.js uses can relatively easily write Adaptors for other things (postgres clients, for example)

Other ORMs

http://persistencejs.org/ <-- Datamapper based, mysql + sqlite adaptors

User authentication frameworks

https://github.com/ciaranj/connect-auth <-- node.js middleware version of Ruby's Warden framework

https://github.com/bnoguchi/everyauth <-- node.js middleware version of Ruby's OmniAuth framework

Testing

http://vowsjs.org/ <-- Vows

https://github.com/caolan/nodeunit <-- nodeunit

Standard Library Stuff

https://github.com/caolan/async <-- Async Iteration tools for node.js

http://howtonode.org/do-it-fast <-- avoid event loop hell

https://github.com/wdavidw/node-each <-- async each loops

https://github.com/substack/node-seq <-- chainable async Iteration etc

Something to keep the nested callbacks at bay. For example, an Observer pattern. Or Fibers/Promises. (Or Coffeescript...)

Useful Javascript stuff I can NOT use (because it implicitly blocks):

Underscore.js for Iterations: each implicitly blocks

source (the underscore.js source): http://documentcloud.github.com/underscore/docs/underscore.html

What I would love, but can't find

MochiKit.Base's partial/bind, extend, repr, and Adaptor functionality without the (blocking?) functional tools

But, if a library makes a blocking call available to me, I would rather not mistakenly reach for it when I want a non-blocking tool.

Development Tools

Reloading code when files change (for example, during development)

https://github.com/isaacs/node-supervisor <-- watches a directory structure and reloads the Javascript files when changes are made. (node.js does not do this by default, and neither does express apparently)

cluster (mentioned elsewhere in this document) can also do this.

http://learnboost.github.com/cluster/docs/reload.htmlbr> http://github.com/mde/jake <-- Make/Rake like tool

Deployment Tools

http://railsbros.de/2011/02/18/deploying_a_node_js_server_with_capistrano_and_cluster.html <-- Deploying node.js with Capistrano and Cluster

Production Tools

http://learnboost.github.com/cluster/ <-- create a cluster of node.js servers. Thus your node.js app is load-balanced on the one machine, and you are running N event loops on the same machine. (Need multiple machines running cluster? Stick a load-balancer in front of your load-balanced machines!) Plugin community.

https://github.com/pgte/fugue <-- billed as "Unicorn for node", but Fugue's own author says that you should probably use Cluster

Q: "How do I get a local copy of all the libraries I use, like `bundle package` in Bundler?"

A: npm will by default install packages into a local space (the node_modules folder of your project)

Good node.js reads

http://stella.laurenzo.org/2011/03/bulletproof-node-js-coding/

Node.js & Coffeescript

http://zappajs.org/ <-- A Coffeescript Sinatra-like built on top of (node.js) Express

http://ariejan.net/2011/06/10/vows-and-coffeescript <-- Vows + Coffeescript

"CS automatically integrates with require.extensions so if your scripts have a "coffee" extension they will run as coffeescript.."

References / Presentations / Slideshows to watch

http://www.slideshare.net/fleegix/mde-txjs-2011fullstackfallacies <-- no recorded audio, :(

http://blip.tv/jsconf/jsconf2011-tom-hughes-croucher-5478056 <-- Tom Hughes-Croucher's node.js talk at JSConf 2011. I picked up a lot of information from this talk!

