A new face for Win32

The runtime infrastructure used to provide WinRT is only half the story. The other half of the story is the WinRT software library; the collection of APIs used to actually build WinRT applications. Just as the runtime infrastructure is a new spin on an old technology, so too is the software library. But while Microsoft is happy to talk about how WinRT is built on the time-honored COM technology, you get the feeling that it doesn't want us to think of the library in quite the same way, hence the inaccurate diagram it bandies about to explain its APIs.

Microsoft's diagram places the WinRT APIs and application model directly above "Windows Kernel Services," as if WinRT's API was some alternative that was independent of Win32. C and C++ desktop applications leverage kernel services via Win32; Metro style apps access kernel facilities via WinRT.

It's a nice idea. It just isn't true.

Let's be very clear about this. There are two important ways in which it isn't true, and a third minor way. First, the WinRT library is a layer on top of Win32. It uses some new bits of Win32, such as Direct2D and DirectWrite. It uses some much older bits of Win32, such as the shell library (a disparate set of utility functions for, among other things, manipulating files and paths) and Winsock (the network API). But either way, it uses Win32. It is not a sibling of Win32, it is not an alternative of Win32; it is a client of Win32, a consumer of Win32, just like every other application.

Second, Metro-style applications do not use WinRT exclusively. WinRT is very important, and I think that any reasonable Metro-style application will end up using WinRT, at least a little bit, but not exclusively. But there are also some important APIs that Metro-style applications can use, but which don't form a part of the WinRT COM world at all. Probably the most important of these is Direct3D 11. Games in the Metro world can use Direct3D 11, but they'll have to be written in C++ to do so, and they'll have to use Direct3D's traditional COM-like design without projections or metadata or any of the other pieces of WinRT infrastructure.

There are also lots of portions of Win32 available to Metro apps that have partial WinRT alternatives, but which are also exposed directly for when the WinRT alternatives aren't flexible enough. For example, WinRT has APIs for opening, reading, and writing files, but also provides access to low-level Win32 functions (some new, some old) that perform the same functions.

This is where there are some terminology differences. One could pretend that the diagram is accurate and hence decree that any API that Metro applications can use (whether using WinRT COM, traditional COM, or no COM at all) is a "WinRT API," but I think that this is not especially helpful, and such usage is not supported by most of the documentation. Rather, there is the Windows Runtime API, using the new kind of COM, and there's a load of Win32 APIs that are also permitted.

Whether this is important is a matter of opinion, but there are reasons to be disappointed that WinRT isn't true to the diagram. For a start, there are a few long-standing annoyances of Win32, such as the inability to elegantly handle paths and filenames longer than 260 characters, that are inherited into WinRT. Once baked in to an API, these things are really hard to remove, which is why Win32 is lumbered with this limit even though the Windows NT kernel is fine with paths up to 215 characters long (or thereabouts).

More subtly, this design means that Microsoft can't readily discard the backwards compatibility features that are built in to Windows and Win32. Over the years, Windows applications have come to expect Win32 to operate just so, depending not just on the official, documented behavior, but on aspects of the particular implementation. For example, some functions still work when passed values that the documentation says are prohibited. Sloppy developers accidentally use these prohibited values, see that everything works OK, and then ship their application. That's all fine, until Microsoft then wants to make the function stricter, or more efficient, or add new permitted values; the company then discovers that making this change breaks extant, shipping applications.

As a result, there's all manner of workarounds and fixes scattered throughout Windows. With WinRT being built on Win32, Microsoft is fostering a whole new generation of applications that, implicitly or explicitly, depend on these undocumented Win32 behaviors, with the result that changing them will continue to be a liability. If WinRT were built directly on top of the Windows NT kernel, as Microsoft's diagram implies, then it would no longer have to preserve these Win32 behaviors behind the scenes. As a result, future versions of WinRT will not only have to accumulate their own compatibility cruft and baggage (so that Windows 9 does not break compatibility with Windows 8, for example); they'll also have to retain and preserve all the Win32 detritus as well, ensuring long-term complexity and maintenance burden.

This is true even of the ARM-based Windows RT. Though third-party desktop applications are banned on Windows RT, this hasn't allowed Microsoft to strip out unnecessary Win32 cruft. It's all still there, powering WinRT.

The third way in which the diagram isn't true is that desktop applications (written in C++, .NET, or whatever else a developer prefers) have access to some parts of the WinRT API. While most parts of WinRT are off-limits to desktop developers (including unfortunately, the toolkit for building GUIs), a few bits and pieces are available to developers of both desktop and Metro applications. There's no real policy or consistency here; it was apparently a decision each development team could make for itself.

What's in the box?

WinRT is, at the moment, a fairly narrow API. It's built for producing Metro-style tablet apps; touch-capable GUI apps that are typically Web-connected, that make use of sensors like cameras, GPS, and accelerometers, tending to be used for content consumption rather than content creation. Accordingly, WinRT can't be used to create Windows services, or desktop applications, or extensions for things like Explorer. Particular emphasis is placed on being highly responsive to the user ("fast and fluid" is Microsoft's catchphrase). WinRT applications are run in secure sandboxes, with limited access to the rest of the system, and they can be suspended or terminated at any time, with only limited support for multitasking.

These factors all influence the design of the WinRT API: in Windows 8, the first iteration of WinRT, it's not a general purpose API, and it's not supposed to be. Much of the API is unexceptional and will feel reasonably familiar to Windows and .NET developers, but a couple of parts deserve special attention: the features it has for GUIs and the general approach to both disk and network I/O.

Once again, Microsoft looked at the technologies it had already developed and reinvented them for WinRT's GUI API. WinRT borrows heavily from WPF and Silverlight. It uses a new dialect of XAML, the same XML-based markup language for GUI design, and uses essentially the same techniques for plumbing application data into the GUI, and responding to user input. There is an arguably important difference, however: WinRT's XAML toolkit is written in native code, whereas all the XAML toolkits that went before it (WPF, Silverlight, and Windows Phone—the three are, regrettably, similar, but not entirely compatible) were predominantly written in .NET code.

Though WPF itself was managed code, it had an important native code part to handle actual drawing to screen. That's because it used Direct3D for drawing, to enable it to take advantage of hardware acceleration. Realizing that that native code portion was generally useful, Microsoft refined it to create Direct2D, a hardware-accelerated API for 2D applications. However, WPF was never switched to use Direct2D; it still uses its own, private library. WinRT's XAML, however, does use Direct2D.

This is a logical evolution. However, the decision to rewrite the XAML stack in native code has come at some cost; WinRT XAML lacks many of the capabilities of WPF, and even Silverlight, its closest relative, provides a richer, more capable library than WinRT right now. This is not to say that WinRT is weaker than Silverlight across the board—its media and image handling capabilities are better than those of Silverlight, for example—but as a GUI toolkit, it's a small step backwards for Silverlight developers and a large one for WPF users.

Keeping the user interface fast and fluid

As is the case with every mainstream GUI library, WinRT's XAML is single-threaded. On the face of it, this might seem an odd decision, given today's proliferation of multicore, multithreaded processors, but it isn't. By "single-threaded" I mean that user input is all delivered on a single thread, and updates to the UI must all be made on that same single thread. This is because these things have a clear ordering to them: if I type 'c' 'a' 't' on the keyboard, it's imperative that the application sees those keystrokes in that order and handles them sequentially. If user input were delivered on multiple threads, it would be possible for input that was generated later to be processed sooner.

This single-threaded approach has a well-known downside, however. If the thread handling input ever pauses for some reason, it means that the application can't handle any other input. The input just gets queued up. If the thread pauses for long enough, that queue can fill up completely, at which point Windows will beep each time you try to add new input (by pressing the mouse button or typing), and the new input events are simply discarded.

To avoid this problem, then, you have to make sure that the input thread never pauses. Unfortunately, while most developers know that they're not supposed to make their input thread block, they often do it anyway. It's just too easy to do. The traditional culprits are I/O—either doing something with the disk, or with the network. The thing about I/O is that it can be really slow. A network connection to an unavailable server can wait several seconds before failing, for example. Attempt this connection in your input thread, and you'll end up blocking input for several seconds, creating an application that's unresponsive and slow.

In spite of knowing about this problem, developers continue to attempt I/O from the input thread. Part of the reason is that it's insidious. Imagine an application that communicates with some network server. In development and testing, that server could be local to the developer; perhaps on the developer's own machine, or at least on his own LAN. In this situation, the network connections will always seem to be fast enough that they're safe to perform in the UI thread. They may still take a few milliseconds, but the delay probably won't be noticed, so it's easy for the developer to think it's good enough. It's only when the application is used in the real world that the problem emerges: the real world has crappy hotel Wi-Fi and slow 3G connections where each network connection can introduce hundreds or thousands of milliseconds of delay. All of a sudden, the application that was fast and fluid for the developer is slow and jerky for the user.

Part of the reason that developers do this is that it's just plain easier. Many I/O APIs are exclusively synchronous and blocking, which is to say, that they make the thread calling the API stop in its tracks while waiting for the I/O operation to occur ("synchronous" and "blocking" are often used synonymously; though there are subtleties that make this not quite accurate, it's probably good enough for our purposes). Many more I/O APIs are synchronous and blocking by default, but also offer non-default modes that are asynchronous and non-blocking; that is, they allow the thread calling the API to keep on working even while the I/O operation is occurring in the background.

The reason that synchronous APIs are so often the default is that they're much easier to work with and think about. A lot of what programs have to do is sequential in nature. You can't read a file until you've first opened it, and you can't interpret the data until you've first read it from the file. Necessarily, when the user clicks a button to load a file, you have to open the file, then read the file, and then interpret the data. Synchronous APIs, where the thread stops what it's doing until the file is opened, and then stops again until the data has been read, make this sequential programming natural and easy.

Asynchronous APIs aren't so straightforward. The virtue of an asynchronous API is that the thread that started the I/O operation can do something else while it's waiting for the I/O to finish. The difficulty with an asynchronous API is figuring out just what to do once the I/O operation is done, and the data is available, ready to be used. The thread that started the I/O operation has moved on with its life and is now busy responding to the next bit of user input; as far as that thread is concerned, the fact that the user clicked a button to load a file has been dealt with.

Providing a neat way for the programmer to specify what happens next is one of the biggest challenges for asynchronous APIs. They all need a way of letting the developer say "once the data has been read, then do this;" turning the sequential ordering that's implicit with synchronous APIs and making it explicit.

The approaches that asynchronous APIs use to handle this vary. Windows has always had asynchronous APIs. The Windows jargon is "Overlapped I/O," because you can perform multiple I/O operations on a file simultaneously, such that the operations overlap with one another. The way in which it handled this particular issue was very primitive and low-level. The operating system would tell the application when each overlapped operation was complete, but it was entirely up to the application to work out which I/O operation is which, and what the next processing step is.

Awkward as it may seem, way back in the OS/2 days Windows was originally designed to work this way exclusively, and behind the scenes, all I/O is overlapped I/O. But because it's so awkward to use, the kernel team added the ability to have the kernel take care of the asynchrony and act as if it had synchronous APIs.

Traditional UNIX systems have had a vaguely similar approach for network I/O—the system would tell you which operation was done and the application would then determine how to proceed—but unlike Windows offered nothing comparable for file operations. Modern UNIX systems use a variety of (incompatible) mechanisms for handling asynchronous disk and network I/O (though again, this tends to be quite low level), and some also support standard POSIX asynchronous I/O (AIO). POSIX AIO is notable because it has two modes of operation. It can work similarly to the other APIs, merely informing an application that an I/O operation is complete, but it has a second mode: the application can specify a particular function to call when an I/O operation is complete, and the operating system will call that function directly when it's the right time to do so. This kind of function, a function in the application's code that the operating system calls is known as a "callback."

Not everything that can hold up the input thread is an I/O operation, of course. Sometimes computation itself is the slow part. So as well as a way of doing I/O asynchronously, there needs to be a way of doing computations asynchronously; of pawning them off onto a separate thread, so that the input thread can keep on trucking. Over the years, many ways of doing this have developed. The time-honored UNIX approach was to use a separate process for each separate computation. In Windows, the preference has been to create separate threads for these background tasks.

There's also a converse problem. Updating the UI is also single-threaded, and it has to be done from the input thread. This means that there needs to be a way of switching back to the input thread to make changes to the user interface, after any I/O (or slow computation) has been performed.