It felt like someone had knocked the wind out of me. Was this real? This had to be fake, I thought to myself. Just two games and the server is having a hard time processing them.

I started thinking about where I went wrong in my code. I thought collision detection, for sure, had to be the bottleneck. But I was already using quadtrees to help narrow down the number of collision detection passes.

I had to do some dirty work, so I spun up a new Digital Ocean server to use as my development server. I then temporarily disabled collision detection completely and saw the problem was still there.

OK — if collision detection was not the problem, then what else could it be?

I thought about how much information I was sending from the server to each client every second. I had this broadcast function that sent out the state of the game every 22 milliseconds to each client. In this function, I was unnecessarily filtering out the given client’s local player in an allPlayers property, just to put the local player in its own property. So, not only was I putting a for loop (the filtering) in another for loop (the broadcast for each client), I was also customizing the data to be sent out by this broadcast function for each client.

This customization was not necessary. I should just be able to send out the state of the game to everyone with no customization. Everyone should get the same data (and the data should not be tailored to a specific client). This had to be where the CPU is being eaten up. So I optimized this function, pushed it up to the dev server, and checked the CPU graph. No fix.

With my ignorance, I began to convince myself that ~10–20 players per 1 core server was good. Now, how do I come to such a conclusion? Well, my extreme confidence in my technical abilities was clearly blinding me from reality. I stumbled onto a post where the creator of agar.io said his 1 core server can handle about 190 players. I quickly snapped out of it.

The next culprit I had lined up was: socket.io. I was using socket.io to manage the real-time communication between the client and the server. I had heard before that socket.io was not as lightweight as other alternatives.

Back in the day, if you wanted to send a message asynchronously, you had to implement some sort of hack: long polling or flash sockets. This was because not all web browsers supported websockets. But most browsers now offer native support. But in order for socket.io to establish a connection, it first does so by using one of the available hacks mentioned, and then upgrades the connection if the client supports a better way. Even though websockets are already widely supported. This approach comes at the expense of CPU and memory. But not as much as I had thought…

I hopped online and naively typed “socket io cpu problem” into Google. The first couple results were entitled “Node.js — How to debug Node + Socket.io CPU Issues — Server Fault” and “Node.js — Socket.io node server using high cpu — Stack Overflow.” My eyes lit up. I was reassured that this was the culprit to my problem. But I clicked on the first article and the author mentioned he was dealing with ~1,500 concurrent socket connections. I’m no math major, but 20 players is significantly less than 1,500 players.

Just for the heck of it, I switched my server-side Node app to use tiny websockets, then switched the client-side Node app to use native websocket support, right inside the browser. I pushed the changes up to the dev server, and checked the CPU graph. No fix.

My morale was at an all time low. I began to cringe every time I had to check the darn CPU graph. I thought I was never going to get that blue line to stop running away from me. This was the only time I ever felt completely incapable of handling some technical task. But then it happened…

I was sitting in front of the CPU graph wallowing in my misery when I noticed something. It didn’t matter how many full games were running or how close together they were all started. The CPU was increasing steadily at a constant rate. I had never stayed around long enough to observe this. Memory leak!

I scanned my code, line by line, looking for the bug (which I should have done at the very beginning). There it was.

In my game, an event is an object that captures info about things like player deaths, boosts, and collisions. So an event is created every time one of those things happens.

I have this loop that goes through each event and updates it. It’s called every 16 ms. After an event fulfills its duty, it’s supposed to be deleted. Keywords: “supposed to.”

Bingo. I had memory piling up as well as an increasing amount of unnecessary for-loop passes. I inserted a line of code and voila!