This last week has been very exhausting. It was the week of the release of Academia‘s Early Access. I thought we’d have a smooth release this time because we already have a good candidate build by September 5, Tuesday. We were releasing on September 8, Friday. Things were pretty chill. We distributed some keys to Youtubers so they can start recording their games. Then Wednesday night, one of the Youtubers reported this:

Every single agent in the game acted like zombies with nothing to chase. They don’t do anything else. They’re stuck in this trance state for eternity. Worst of all, I can’t replicate it on my machine. Our artist, Ryan Sumo, can replicate it on his end consistently. This is an unacceptable bug because it stops the flow of the game. The bugs that are hardest to fix are those you can’t replicate, and we only found out of this ugly one two days before release.

The whole day of Thursday was spent on blindly fixing this bug. Anything I try is a just a “guess”. I build the game, upload it to our Steam testing branch, then Ryan downloads it and… “I’m sorry your fix didn’t work”. My emotions go from hopeful to heartbreak, then back to hopeful if I thought of a good guess, and back to sorrow again if the guess failed. It was very tiring.

My best guess revolved around our A* processing thread. We have a separate thread that has a queue of A* requests. The agents can enqueue an A* request whenever they need one and wait for the result. The thread checks the queue request and executes the A* search. This thread runs indefinitely. The agents in the gif are doing their default behavior while waiting for the A* result that they requested. They roam around in their current tile. My guess is that the thread stopped working such that they no longer get their results, thus their behavior of waiting in vain.

This was the unfixed code (Not the actual code. This is a shortened one.):

// Runs the thread that constantly checks the queue class AStarThreadQueue { private ActionThread thread; private Queue<AStarResolution> queue = new Queue<AStarResolution>(); public void RunThread() { this.thread = UnityThreadHelper.CreateThread((Action)Process); } public void Enqueue(AStarResolution resolution) { Assertion.AssertNotNull(resolution); this.queue.Enqueue(resolution); } private static readonly object SYNC_LOCK = new object(); // This is the method that the thread executes indefinitely private void Process() { while(true) { if (this.queue.Count > 0) { AStarResolution resolution = null; lock (SYNC_LOCK) { resolution = this.queue.Dequeue(); } resolution.Execute(); } } } }

This code works without problems while in Unity editor, even with hundreds of agents. But when built to an exe and tested on other computers, the thread breaks. If you know multithreading, you can probably see the problem from a mile. First, I didn’t have exception handling inside that while loop. If an exception occurs inside, the loop breaks and the thread ends. Second, the queue is not locked in Enqeue(). Based from the logs, the code

lock (SYNC_LOCK) { resolution = this.queue.Dequeue(); }

returns null even when the queue count is checked and the items in it are guaranteed to be not null. This is probably due to access conflict to the queue.

This is what the fixed code looks like:

// Runs the thread that constantly checks the queue class AStarThreadQueue { private ActionThread thread; private Queue<AStarResolution> queue = new Queue<AStarResolution>(); public void RunThread() { this.thread = UnityThreadHelper.CreateThread((Action)Process); } public void Enqueue(AStarResolution resolution) { Assertion.AssertNotNull(resolution); lock (SYNC_LOCK) { this.queue.Enqueue(resolution); } } private static readonly object SYNC_LOCK = new object(); // This is the method that the thread executes indefinitely private void Process() { while(true) { try { if (this.queue.Count > 0) { AStarResolution resolution = null; lock (SYNC_LOCK) { resolution = this.queue.Dequeue(); } resolution.Execute(); } } catch(Exception e) { // We log the error but do not end the thread Debug.LogError(e.Message); } } } }

With this, Ryan finally claimed “OK, the fix seems to work!”. I cried inside. The fixed code looks somewhat easy now, but the journey to this fix was not. I tried a lot of other fixes before this. I even turned off that Graphics Jobs feature of Unity because it might have messed with our thread. It’s a stupid theory but I’m desperate. Might as well try it.

While multithreading is useful, I realized that it could also be ruthless if you don’t know what you are doing. Treat this as a cautionary tale if you use threads in your projects. Test your build on a variety of machines and with the most complex state of the game. Avoid this kind of nightmare on your release day.

Our game Academia: School Simulator is now available on Steam Early Access. It’s currently discounted at 20%. We’re still a long way to go and we have lots of features to implement. Buy it now while it’s cheap.