I have a server/client application running that opens listeners using the core net and tls modules. Each client connection to the server results in the server forking a new worker to interact with the client. This can lead to multiple workers listening on the same port. Typically this is not a problem as connections are distributed between the two, and they are both performing the same task internally. If worker 1 has created a listener on port 15000, then worker 2 is created and starts its own listener on port 15000, there aren't any issues.

However, if worker 1 has created a listener on port 15000, closed it, and then opened it again (fairly typical use case), and then worker 2 is created and tries to create a listener on port 15000 as well, then worker 2 gets an EADDRINUSE error.

It seems to be caused by the overall cluster losing the idea that this Server handle should be shared across all workers and that the newly created listener is owned only by the worker process that created it, thus the port isn't available to other workers. (Maybe this isn't how the actual port sharing works across cluster workers, and this is just a misunderstanding on my part.)

I've written this small test that exhibits the issue. It forks a new worker every 10 seconds where each worker will create a listener on port 16000, tear it down 15 seconds later, then, after 15 more seconds, create another listener on port 16000:

var cluster = require('cluster'); var net = require('net'); if (cluster.isMaster) { setInterval(function() { cluster.fork(); }, 10000) } else { var workerID = cluster.worker.id; var server; var setup = function() { console.log('Worker ' + workerID + ' setting up listener'); server = net.createServer(function(stream) {}); server.on('error', function(err) { console.log('Error on worker ' + workerID, err); console.log('Worker ' + workerID + ' exiting') process.exit(); }); server.listen(16000); setTimeout(teardown,15000); } var teardown = function() { console.log('Worker ' + workerID + ' closing listener'); server.close(); setTimeout(setup, 15000); } setup(); }

The behavior is seen when worker 1 attempts to recreate the listener (followed by subsequent workers as the loop continues):

Worker 1 setting up listener Worker 2 setting up listener Worker 1 closing listener Worker 3 setting up listener Worker 2 closing listener Worker 1 setting up listener Error on worker 1 { [Error: bind EADDRINUSE null:16000] code: 'EADDRINUSE', errno: 'EADDRINUSE', syscall: 'bind', address: null, port: 16000 } Worker 1 exiting Worker 4 setting up listener Worker 3 closing listener Worker 2 setting up listener Error on worker 2 { [Error: bind EADDRINUSE null:16000] code: 'EADDRINUSE', errno: 'EADDRINUSE', syscall: 'bind', address: null, port: 16000 } Worker 2 exiting ...

Any idea as to why this is happening or if there was a workaround for it?

Edit to add an additional test case

Some feedback has mentioned that the initial test case here could create an issue with polluting the event loop stack. I've updated the test to the following which removes the majority of the timeouts and only leaves those necessary to demonstrate the issue at hand.

This new test case will spawn a new worker every 100ms. Each worker will establish a listener on port 16000. Worker 10, upon successfully establishing a listener, will set a timeout to tear down its listener 1s later. At the time that worker 10 calls the teardown function, there should be 18 other workers listening on port 16000. Worker 10 will not succeed again in setting up the listener.

In this particular case, without having a delay between a successful close and an attempted listen, this worker will cause the master to no longer be able to fork new workers; however, even adding a timeout of 500ms to the retry attempt of worker 10 (which will allow the master to continue forking workers), worker 10 will still not succeed in setting up a listener. With the timeout, the newly forked workers are all able to successfully establish listeners.

var cluster = require('cluster'); var net = require('net'); if (cluster.isMaster) { cluster.fork(); setInterval(function(){cluster.fork()},100); } else { var workerID = cluster.worker.id; var server; var setup = function() { console.log('Worker ' + workerID + ' setting up listener'); server = net.createServer(function(stream) {}); server.on('error', function(err) { console.log('Error on worker ' + workerID, err); teardown(); }); if (workerID == 10) { server.listen(16000, function() { setTimeout(teardown, 1000); }); } else { server.listen(16000); } } var teardown = function() { console.log('Worker ' + workerID + ' closing listener'); server.close(setup); } setup(); }

This test produces the following output: