Apparently, not everyone on The Witness team was seeing the symptom. But trust me, it was there.On both Jon and my machines, there was a period during the initial loading of the game where you couldn’t do things like ALT-TAB back to the debugger. If you set a breakpoint then ran the game, when the breakpoint eventually hit and the debugger came to the front, it would wait about five seconds before it allowed you to actually interact with it. Sometimes, in rare cases, it would go into some sort of psycho mode whereby mouse movements stuttered, and the computer would literally beep at you as the mouse moved, for several seconds. Then everything would return to normal.While not technically a showstopper, this is the kind of bug that’s worth fixing if you can, because it costs you real development time every day. Five seconds wasted every breakpoint is bad news, and people often underestimate the effects of frustration that can build up over time due to flaky and unpleasant development environments. So I decided to try to track it down.Because Windows is a closed-source platform, there’s no great way to track down problems of this nature. However, having programmed on Windows for over twenty years now, I have a nose for it, so I felt like I knew the best place to start: since there was definitely a UI blackout somewhere during startup, there had to be something occurring early in the execution of the program that was causing at least that particular symptom. So before doing anything else, I started stepping through the entire startup sequence of the program to see if anything out of the ordinary struck me.I very quickly came upon this code:

The great thing about programming on Windows is that it is the only commercially viable platform where you can ship software to users without getting approval from a giant bureaucracy (well, perhaps I should say it used to be ). The not-so-great thing about programming on Windows is that, well, the Windows API is a horrific nightmare.Granted, it is not uniformly horrific. It ranges from only slightly scarey (xinput, comdlg32) to full psycho (DirectShow, Event Tracing, TabletPC, etc.). But no matter which part of the spectrum you encounter, programming on Windows tends to be about doing a lot of unnecessary research, experimentation, and debugging due to a confluence of poorly designed APIs.Since time spent debugging Windows issues isn’t often chronicled, I thought I’d go ahead and describe my experience this Saturday of tracking down a Windows bug in The Witness

Now, I don’t want to make it sound like the “find the source of the problem” phase of Windows debugging is straightforward. It usually takes a very long time, sometimes multiple days. But as luck would have it, this was literally the first piece of code I suspected, and it turned out to be the culprit. I commented out the Now, I don’t want to make it sound like the “find the source of the problem” phase of Windows debugging is straightforward. It usually takes a very long time, sometimes multiple days. But as luck would have it, this was literally the first piece of code I suspected, and it turned out to be the culprit.I commented out the SetWindowsHookEx , ran the program, and poof! All the symptoms were gone.

The Problem The Problem

From experience, I’ve learned that it’s always best to fully understand a problem before you fix it. If you just patch over its symptoms but never figure out what the problem really was, it will often come back to haunt you. In this context, that meant two things: first, I should figure out specifically why the keyboard hook was causing the UI stalls; second, I should determine why the code had been trying to call The first part turns out to be obvious in hindsight, but it took me a little while to put all the pieces together and be sure I had a solid explanation for what we were seeing. As with any puzzle, looking at the finished picture is easy, but figuring out what that picture might be when all you have are little pieces coming in one at a time is much more difficult. Here’s the best picture I could come up with, but of course there’s no way to completely verify it external to Microsoft: Normal Windows hooks are pieces of code which execute in the address space of the eventual message recipient, not the installer of the hook. It is for this reason that, normally, if you want to install a global hook, you must put the hook code in a DLL, because that DLL must be mapped into the address space of every running executable on the machine that can receive Windows messages. But low-level hooks are completely different. A low-level hook like WH_ KEYBOARD_ LL is a piece of code that resides in only the process that registers the hook. Windows remembers which thread registered the hook, and when any other executable is about to have a hooked message posted to its message queue, Windows actually waits until it can switch contexts back to the hook thread, run the hook in that context, and then finally deliver the message. This leads directly to the symptom that we were experiencing. When the game is loading, its main thread isn’t yielding to Windows frequently like it is during normal play. Since the main thread is the one that registered the hook, Windows must wait for that thread to finish all the load-time work it was doing and call a function which yields to Windows in order for Windows to actually call the hook procedure. Since no keyboard processing can occur until the hook procedure has run, the keyboard becomes unresponsive as messages back up waiting to be processed through the hook. This completely explains the UI problems during startup, and fits with the oddity that the mouse never seemed to be affected, just the keyboard. When the game hit a breakpoint, the keyboard hook couldn’t execute at all because the debugger had halted the thread required to execute it. This would have meant that keyboard input would be completely halted, and the debugger would never be able to receive keystrokes, if it weren’t for the fact that Windows actually uses a timeout on hook calls. If the timeout value is exceeded waiting for the hook thread to become available, the hooking executable is assumed to have crashed, and it will silently remove that hook and continue with normal operation. That timeout value? Five seconds — right in line with our observed pause (technically, it’s whatever is set in the registry under HKEY_ LOCAL_ MACHINE\Control Panel\Desktop\LowLevelHooksTimeout, but the default is five seconds). So, no mystery as to the pathology. But what about the second part? What was the code actually trying to do? From experience, I’ve learned that it’s always best to fully understand a problem before you fix it. If you just patch over its symptoms but never figure out what the problem really was, it will often come back to haunt you.In this context, that meant two things: first, I should figure out specifically why the keyboard hook was causing the UI stalls; second, I should determine why the code had been trying to call SetWindowsHookEx in the first place, even though apparently it never quite worked (according to the “is never invoked” comment).The first part turns out to be obvious in hindsight, but it took me a little while to put all the pieces together and be sure I had a solid explanation for what we were seeing. As with any puzzle, looking at the finished picture is easy, but figuring out what that picture might be when all you have are little pieces coming in one at a time is much more difficult. Here’s the best picture I could come up with, but of course there’s no way to completely verify it external to Microsoft:Normal Windows hooks are pieces of code which execute in the address space of the eventual message recipient, not the installer of the hook. It is for this reason that, normally, if you want to install a global hook, you must put the hook code in a DLL, because that DLL must be mapped into the address space of every running executable on the machine that can receive Windows messages.But low-level hooks are completely different. A low-level hook like WH_

The Windows Logo Keys The Windows Logo Keys

Although This article describes how to temporarily disable keyboard shortcuts in Microsoft Windows to prevent disruption of game play for full screen games… Use a low-level keyboard hook to filter out the Windows key from being processed. The article goes on to show sample code that is basically the same code that was being used in KEYBOARD_ LL whose hook routine does nothing but block VK_ LWIN and VK_ RWIN keys from being processed. The only difference between the code in The Witness and the code in the sample is that The Witness would conscientiously remove the hook when it no longer had focus, and then reinstall the hook when focus was regained. The sample code, by contrast, just set a global variable. So if this was what the code was doing, why the frowny face? If you look back at the comment, it states very clearly that the hook was never being invoked, even though no error message came back from Windows. And, although I normally remap my Windows keys via the registry, I went ahead and mapped them back to see if The Witness was properly blocking their effects. Much as the frowny face foretold, it wasn’t. Although SetWindowsHookEx might seem like an odd call to make in a game executable, it’s actually quite common. It’s there to prevent the Windows logo keys on modern keyboards from ruining full-screen games. Because the keys are placed in a location that’s easy to hit by mistake, many gamers would accidentally hit them, causing their games to be instantly deactivated by Windows in favor of bringing up the start menu. So widely acknowledged was this problem that, when these keys were first introduced, Microsoft itself published a recommended work-around. From Disabling Shortcut Keys in Games This article describes how to temporarily disable keyboard shortcuts in Microsoft Windows to prevent disruption of game play for full screen games… Use a low-level keyboard hook to filter out the Windows key from being processed.The article goes on to show sample code that is basically the same code that was being used in The Witness . It’s a simple SetWindowsHookEx call with WH_

Obeying the Static Discipline Obeying the Static Discipline

I don’t know anything about digital circuit design, but I once watched a lecture by For some reason, I took this concept to heart in a programming sense and have found it is a good rule to code by. My version of the static discipline, adapted for software, is that whenever you are making a modification to a piece of code, you should always leave it in a state of stability equal to or better than how you found it. And preferably the latter. All too often, people go in to fix a problem or add a feature to a piece of code, and they just hammer on it until it does that one new thing. The resulting code is then usually more fragile, less well designed, more unnecessarily complex, etc. To try to prevent myself from having this effect, I try to observe the static discipline. And so it was with this Windows key situation. I could see what the code was supposed to be doing, and although I could reasonably fix my bug by just commenting out the hook (because it didn’t work), I felt it was the more disciplined thing to do to figure out why the hook wasn’t working, and to implement something that did what it was trying to do. Since there was no obvious bug, I had to start experimenting. After adding some debug outputs and running Although that turned out not to be the case, this experiment did pay off unexpectedly (as many do): although the keyboard hook still never got called when This lead to the obvious question, what was I don’t know anything about digital circuit design, but I once watched a lecture by Gill Pratt where he explained something called “the static discipline”. In essence, it is the requirement that any component in a circuit must accept voltages with a certain range for “1” and “0”, and then must output its own results in a certain range. These ranges are defined so as to require that components always produce equal or better conditioned voltages than their inputs, thus ensuring that the digital signals being propagated through the circuit don’t degrade into noise due to lots of tiny losses.For some reason, I took this concept to heart in a programming sense and have found it is a good rule to code by. My version of the static discipline, adapted for software, is that whenever you are making a modification to a piece of code, you should always leave it in a state of stability equal to or better than how you found it. And preferably the latter.All too often, people go in to fix a problem or add a feature to a piece of code, and they just hammer on it until it does that one new thing. The resulting code is then usually more fragile, less well designed, more unnecessarily complex, etc. To try to prevent myself from having this effect, I try to observe the static discipline.And so it was with this Windows key situation. I could see what the code was supposed to be doing, and although I could reasonably fix my bug by just commenting out the hook (because it didn’t work), I felt it was the more disciplined thing to do to figure out why the hook wasn’t working, and to implement something that did what it was trying to do. Since there was no obvious bug, I had to start experimenting.After adding some debug outputs and running Spy to keep an eye on the window messages, the first thing I tried was removing the part of the code that unregistered the hook when his window lost focus. Since I didn’t know exactly how his windowing code worked at all levels, I figured it wasn’t entirely out of the question that it could be getting disabled prematurely.Although that turned out not to be the case, this experiment did pay off unexpectedly (as many do): although the keyboard hook still never got called when The Witness window had focus, the hook did start getting called when it lost focus. Yes, oddly enough, the hook was working just fine for everyone else’s window, but on the one window where it actually needed to work, it wasn’t working.This lead to the obvious question, what was The Witness doing differently?

Raw Input Raw Input

This was a case where having lots of Windows programming experience probably saved a lot of time. Instead of having to experiment blindly, I knew right off the bat that Windows Raw Input probably had something to do with the situation. I’ve debugged lots of Windows front end code in the past, and any time Raw Input, DirectInput, XInput, or anything else with the word “input” in it is involved, you know that things are going to be going a little haywire. I commented out the device registration call for Raw Input, and lo and behold, the hook started working for I couldn’t think of any plausible ways that I could convince Raw Input to start using the Windows keyboard hook, since I don’t have access to any of the Windows source code and I have no idea how it works internally. So I focused my efforts on Raw Input. I experimented with just about everything I could think of. I tried not calling DefWindowProc on WM_ INPUT calls, to see if it was the default window processing that was using the Windows logo key events to switch to the start menu. No such luck. I tried using RIDEV_ NOLEGACY during Raw Input initialization to see if preventing keyboard messages from being generated would solve the problem. It didn’t. After many failed attempts, and finding no relevant information on the web, I decided to call it a night and sleep on it. When I woke the next morning, for some reason I remembered the most important thing anyone programming Windows must remember: never, ever trust the documentation. It’s always either inaccurate or incomplete. So when I sat down at my keyboard, the first thing I did was try using the RIDEV_ NOHOTKEYS flag when initializing Raw Input. You see, the previous day I hadn’t bothered to try this flag, because it explicitly says in the documentation that it does not block system-level hotkeys, only application-level hotkeys (like the kind you create yourself with RegisterHotKey). But why was I believing that? That was just something somebody wrote down when they needed to create a documentation page. They probably never even saw the actual code for Raw Input in the first place. Surprise surprise, RIDEV_ NOHOTKEYS worked. After all the fussing, all that had to be done to make The Witness impervious to Windows Logo keys was one little flag during Raw Input initialization. No keyboard hook necessary. This was a case where having lots of Windows programming experience probably saved a lot of time. Instead of having to experiment blindly, I knew right off the bat that Windows Raw Input probably had something to do with the situation. I’ve debugged lots of Windows front end code in the past, and any time Raw Input, DirectInput, XInput, or anything else with the word “input” in it is involved, you know that things are going to be going a little haywire.I commented out the device registration call for Raw Input, and lo and behold, the hook started working for The Witness window, no problem. So, although I was completely unable to find any documentation on MSDN that discussed what was going on under the hood, it must be the case that Raw Input processing happens before low-level keyboard hook processing in such a way as to prevent the keyboard hook processing from ever seeing keyboard messages on a Raw Input-registered window. Normally, this wouldn’t be a problem, but in our case, the Microsoft-recommended fix for the Windows logo key problem relies on keyboard hooks. The Witness needs Raw Input to get good low-level relative mouse movements, but it also needs to disable the Windows logo keys; sadly, this is the kind of situation that is all too routine for anyone who programs Windows professionally.I couldn’t think of any plausible ways that I could convince Raw Input to start using the Windows keyboard hook, since I don’t have access to any of the Windows source code and I have no idea how it works internally. So I focused my efforts on Raw Input.I experimented with just about everything I could think of. I tried not calling DefWindowProc on WM_

Zero-Sum Game Zero-Sum Game