Running Node.js apps in production Frederic Hemberger

@fhemberger Who wrote a Node.js app so far? (Webapp, API, etc.)

Who runs this app in production?

Topics I'll talk about today: Deployment

Run Node.js (and keep it running)

Metrics This talk is just supposed to give a brief overview

All tools and ressources mentioned are linked at the end of the presentation.

Deployment

Deployment Different popular deployment techniques: Git Hooks

GitHub Webhooks

Capistrano, Fabric, deploy.sh, et. al.

Git Hooks Pushing to Git remote on your server # ./git/hooks/post-receive cd /var/www/myapp.com git pull npm install --production service myapp restart ... Done.

Git Hooks Pro: Easy for the developer: Just push to production (aka fire and forget)

Hosting-Platforms like Heroku use this method as well Con: But what happens on the server?

Deployment knowledge is stored separately from code

When deploying on multiple servers, post-receive hooks must be in sync Solution: Add the deploy script to your repository and symlink to post-receive-hook.

GitHub Webhooks

GitHub Webhooks

GitHub Webhooks

GitHub Webhooks Pro: When the rest of your development work already resolves around GitHub, it integrates nicely into the workflow Con: Hooks run all independently in parallel:

E.g. if the CI hook fails, the webhook for deployment still gets triggered.

Some CI services like Travis CI offer their own hooks to trigger a deployment afterwards.

Critical dependency for your deployment:

Remember, even GitHub is down or gets DDoS'ed from time to time

Requires server component running update script.

Must be secured to not accept fake payload or mess up deployment.

Capistrano, fabric, deploy.sh, et. al. Remotely checks out your code from a repository

Directory is named after current date and/or revision

Symlinks it to current

deploy_directory ├─┬ releases │ ├── 20140319001122 │ └── ... ├─┬ shared │ ├── log │ ├── pids │ └── system └── current ⇨ releases/20140319001122

Capistrano, fabric, deploy.sh, et. al. Additionlly triggers scripts that can: restart the web server

create a database and it's scheme

install/update your app's dependencies

Capistrano, fabric, deploy.sh, et. al. Pro: Clean server side application structure (including logs, shared files, etc.)

Trigger arbitrary scripts before/after the deployment

Quickly rewind to previous deployment on error Con: Introduces another language as additional dependency

(Capistrano: Ruby; Fabric: Python)

Run Node.js

(and keep it running)

Run Node.js (and keep it running) Start the script as a daemon: Nodemon/node-forever (written in Node.js)

supervise (UNIX daemontools)

Upstart (Ubuntu)

Example Upstart script start on runlevel [2345] stop on runlevel [06] respawn respawn limit 5 60 NODE_SCRIPT = /var/www/myapp/server.js LOGFILE = /var/log/myapp.log exec start-stop-daemon --start --chuid node \ --exec /usr/local/bin/node -- \ $NODE_SCRIPT >> $LOGFILE 2>&1

More elaborate: PM2 Process manager with built-in load-balancer

PM2 Monitor processes Question: Who should be responsible for process management (creation, restarting, monitoring, clustering)? The OS? The startup script? The application itself?

Whatever method you use to run your applications: Startup scripts should … … be as general as possible (only path, environment, main JS file)

… not contain configuration settings for your application

contain configuration settings for your application … be included alongside your deployment (symlink if necessary)

… be kept under version control as well Starting an app is like starting a car: The starter (keys, remote, button) doesn't need to know anything about the car. It only connects the wires which start the car.

However the controlling hardware must know the car's systems (engine type and performance, ABS, ESP) to act accordingly (maximum speed, braking effect, handling).

There are at least two occasions,

where your app will not be available: While deploying a new version

On application errors/exceptions

Deployment Downtime during deployment should be kept to a minimum: Only deploy tested code to production

Automate the entire deployment process

Use a cluster to reload workers

(complete app restart is only needed if the master changes) e.g. between requests or at the end of a user session

recluster wrapper around Node.js's own cluster module // cluster.js var recluster = require('recluster'), path = require('path') cluster = recluster(path.join(__dirname, 'server.js')); process.on('SIGUSR2', function() { console.log('Got SIGUSR2, reloading cluster ...'); cluster.reload(); }); cluster.run(); Reload cluster workers: kill -s SIGUSR2 <cluster_pid>

recluster // server.js server.on('close', function() { // cleanup });

Errors/Exceptions Different categories of errors: Hardware/network errors:

You're screwed, can't do much about it.

You're screwed, can't do much about it. Component errors:

Database not responding, files missing, wrong access privileges

Throw an exception, exit application (check your restart script!)

Database not responding, files missing, wrong access privileges Throw an exception, exit application (check your restart script!) Programming errors:

Testing your code is great, but some bugs will eventually slip through.

Hardly assessable level of impact, try to fail gracefully

Testing your code is great, but some bugs will eventually slip through. Hardly assessable level of impact, try to fail gracefully Usage errors:

Validate inputs, inform the user and offer guidance Ideally, a simple error (request timeout, processing invalid/missing inputs) should never bring down the entire application.

Errors/Exceptions Bind error handling to individual parts of your application

Those parts may differ in error handling: e.g. request errors, input parsing, external APIs/services

Try to resolve errors with minimum impact to the overall application: Unable to connect? => Notify the user, log error, try again Invalid input? => Notify the user, stop processing

Try to get focused stack traces: Easier for debugging

Metrics

Metrics help you to see What are people really doing?

How do they use the application?

How do they use the application? What errors do occur?

Where are bottlenecks?

Is someone messing with your app?

Metrics: Monitoring What is going on? CPU load, memory usage, Node.js heap size

HTTP requests, response times

Database monitoring, CPU/memory profiling, alerts

Monitoring: look Pro: Open Source Con: Older fork of Nodetime (two years old)

Monitoring: Nodetime, New Relic, etc. (Commercial Products) Pro: Many different metrics

Free tier Con: Free tiers are very limited:

Nodetime: Only one process(!), New Relic: Only 24h data retention

Nodetime: Only one process(!), New Relic: Only 24h data retention May not be suitable for smaller or low-traffic projects

Smallest plans:

Nodetime: 99$/month, New Relic: 149$/month and host

Metrics: Logging Keep your logs in one place, either on application level or in /var/log .

. Use log levels: Separate debug information from warnings and errors

Use a coherent log format (timestamp, level, message, payload)

Separate your access logs (e.g. in Express) from your application logs

Track your deployments with your analytics tools Not everyone combs through log files all the time to find changes

Reveal different kind of metrics, e.g. "After our last deployment, mobile conversion rate increased by N%"

Metrics: Logging One possible solution: Bunyan All logs are stored in JSON format (timestamp, app, message, payload)

Uses streams, offers different targets out of the box: File, rotating file, database, etc.

Metrics: Logging But … Uncaught exceptions are still logged to stderr

Other components may still use console.log statements



node app.js >> /var/log/myapp.log 2>&1

Again, multiple logs in different formats. Still haven't found a 100% satisfying solution for myself

Analysis of gathered metrics Different log formats and destinations make data analysis difficult: # Apache access log 10.0.1.22 - - [15/Oct/2010:11:46:46 -0700] "GET /favicon.ico HTTP/1.1" 404 209 fe80::6233:4bff:fe29:3173 - - [15/Oct/2010:11:46:58 -0700] "GET / HTTP/1.1" 200 44 # Apache error log [Fri Oct 15 11:46:46 2010] [error] [client 10.0.1.22] File does not exist: /Library/WebServer/Documents/favicon.ico [Fri Oct 15 11:46:58 2010] [error] [client fe80::6233:4bff:fe29:3173] File does not exist: /Library/WebServer/Documents/favicon.ico # typical Express.js log output [Mon, 21 Nov 2011 20:52:11 GMT] 200 GET /foo (1ms) Blah, some other unstructured output to from a console.log call.

»ELK« stack E lasticsearch (Storage/Search)

lasticsearch (Storage/Search) L ogstash (Logfile processor)

ogstash (Logfile processor) Kibana (Logfile viewer)

»ELK« stack Pro: Very powerful and extendable log analysis

Parse logs for Squid, Apache, Nginx, Syslog, MySQL, …

Feed logs directly to statsd/Graphite

Easy querying and visualization

Realtime search

Open Source Con: Slightly more complex setup (Java, JRuby, etc.)

Thus might not fit for smaller projects/hosting solutions

Logstash Turns messy data in different log formats … # Apache access log 10.0.1.22 - - [15/Oct/2010:11:46:46 -0700] "GET /favicon.ico HTTP/1.1" 404 209 fe80::6233:4bff:fe29:3173 - - [15/Oct/2010:11:46:58 -0700] "GET / HTTP/1.1" 200 44 # Apache error log [Fri Oct 15 11:46:46 2010] [error] [client 10.0.1.22] File does not exist: /Library/WebServer/Documents/favicon.ico [Fri Oct 15 11:46:58 2010] [error] [client fe80::6233:4bff:fe29:3173] File does not exist: /Library/WebServer/Documents/favicon.ico # typical Express.js log output [Mon, 21 Nov 2011 20:52:11 GMT] 200 GET /foo (1ms) Blah, some other unstructured output to from a console.log call.

Logstash … into structured output { "message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800… "@timestamp" => "2013-12-11T08:01:45.000Z", "@version" => "1", "host" => "cadenza", "clientip" => "127.0.0.1", "timestamp" => "11/Dec/2013:00:01:45 -0800", "verb" => "GET", "request" => "/xampp/status.php", "httpversion" => "1.1", "response" => "200", "bytes" => "3891", "referrer" => "\"http://cadenza/xampp/navi.php\"", "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X… }

Logstash Easily extendable to custom log formats

Read log information from file, Heroku, Redis, RabbitMQ, stdin, syslog, TCP, UDP, XMPP, ZeroMQ, …

Output to file, Ganglia, Graphite, Irc, Loggly, MongoDB, Nagios, RabbitMQ, Redis, Riak, S3, Statsd, Syslog, TCP, UDP, Websocket, XMPP, ZeroMQ, …

Kibana