November 2002/Inside NT's Asynchronous Procedure Call

Asynchronous Procedure Calls (APCs) are a fundamental building block in NT's asynchronous processing architecture. An understanding of this mechanism is essential to better understand how NT works and performs several core system operations.

Basically, APCs allow user programs and system components to execute code in the context of a particular thread and, therefore, within the address space of a particular process. The I/O Manager executive subsystem, for example, uses APCs to complete I/O operations initiated asynchronously. When a device driver calls IoCompleteRequest to notify the I/O Manager that it is through processing an asynchronous I/O request, the I/O subsystem queues an APC to the thread that initiated the I/O request. The next time the thread runs at low IRQL, the pending APC is delivered. This APC's role is copying the results of the I/O operation and status information from system memory into a buffer in the thread's virtual address space. Other interesting ways that APCs are used in NT are to get or set a thread's context and to suspend a thread's execution. The POSIX environment subsystem also uses APCs to simulate the delivery of POSIX signals to POSIX processes.

Although APCs are extensively used throughout NT, documentation on how to use them is completely lacking. In this article, I'll detail how NT processes APCs and document the exported NT functions available to device-driver writers to use APCs in their programs. I'll also show a most-likely implementation of the NT's APC dispatcher subroutine KiDeliverApc to allow you to better grasp the inners of APC delivery.

APC Objects

In NT there are two possible kinds of APCs: user mode and kernel mode. User APCs execute in user mode in the target thread's current process context and require "permission" from that thread to run. Specifically, user-mode APCs require the target thread to be in an alertable wait state for being successfully delivered. A thread enters such a state by calling one of the system functions KeWaitForSingleObject, KeWaitForMultipleObjects, KeWaitForMutexObject, or KeDelayExecutionThread and specifying the wait as "alertable." Alternatively, a user thread can cause user-mode APCs to be delivered to it by calling the undocumented alert-test service KeTestAlertThread.

When a user-mode APC is delivered to a thread waiting alertably, the wait state is satisfied with the completion status STATUS_USER_APC. On return to user mode, the kernel transfers control to the APC routine and resumes the thread's execution when the APC routine completes.

In contrast to user APCs, kernel APCs execute in kernel mode and can be classified as regular or special. As will become evident later in this article, when APCs are delivered to a particular thread, special kernel-mode APCs don't require permission from that thread to run, whereas regular kernel-mode APCs require that certain conditions hold before they are successfully executed. Besides, special kernel-mode APCs can only be blocked when running at raised IRQL, and they can preempt the execution of a regular kernel-mode APC.

Every APC waiting to execute resides in a thread-specific, kernel-managed queue and every thread in the system contains two APC queues, one for user-mode APCs and another for kernel-mode APCs.

NT represents an APC by a kernel control object named "KAPC." Although the DDK doesn't explicitly document APCs, the KAPC object is clearly declared in NTDDK.H as shown in Listing 1. From this declaration of the KAPC object, the self-explanatory fields are Type and Size. The Type field identifies this kernel object as an APC. In NT, every kernel or executive object is tagged with a type so that functions can ensure they are handed the appropriate object type. Size contains a word-aligned value that indicates the length in bytes that this structure takes up in memory. Spare0 may seem a little obscure, but it's not used for anything meaningful except for memory-alignment reasons. The other field's descriptions are less obvious and I'll explain them throughout the following sections.

APC Environments

At any time during its execution, provided that the current IRQL is at Passive Level, a thread may need to temporarily execute code in another process context. To perform this operation, a thread typically calls the system function KeAttachProcess. On return from this call, the thread is executing within the address space of another process. Any APCs previously waiting to execute in the thread's owning-process context cannot be delivered at this time because the owning-process address space is not currently available. However, new APCs can be perfectly targeted at this thread to execute in the new process address space. Even new APCs can be targeted at this thread to execute in the thread's owning-process context when it finally detaches itself from the new process.

To achieve this degree of control over APC delivery, NT maintains two APC environments or states per thread. Each APC environment contains the APC queues for user-mode and kernel-mode APCs, a pointer to the current process object, and three control variables that indicate: whether there is any pending kernel-mode APCs (KernelApcPending), whether there are any regular kernel-mode APC in progress (KernelApcInProgress), and whether there is a user-mode APC pending (UserApcPending). The locations of these APC environments are kept in an array of two pointers at the ApcStatePointer field in a thread object.

The main APC environment is the one located at the ApcState field in a thread object. APCs waiting to execute in the thread's current process context (whatever it is) reside in the ApcState's queues. Whenever the NT's APC dispatcher and other system components query a thread for pending APCs, they check the main APC environment and, in case there are any, they are delivered at this moment or at some later time, modifying its control variables accordingly. The second APC environment is located at the SavedApcState field in a thread object and is used as a backup place for the main APC environment while a thread is temporarily attached to another process.

The first element in ApcStatePointer always points to the APC environment used for the thread's owning-process context, while the second element points to the APC environment used for the thread's new process context, in case the thread is attached to another process. For example, if a thread is running in its owning-process address space, the first element of the ApcStatePointer array contains the address of ApcState and the second element the address of SavedApcState (in this case SavedApcState is empty). The ApcStateIndex field of the thread object contains the value OriginalApcEnvironment, indicating the thread is running in its owning-process context.

When a thread calls KeAttachProcess to execute subsequent code in another process context, the content of ApcState is copied to SavedApcState. Next, ApcState is cleaned up, its APC queues are reinitialized, its control variables set to zero, and the current process field is set to the new process. These steps successfully guarantee that any APCs previously waiting to execute within the thread's owning-process address space are not delivered while the thread is running in a different process context. Next, the thread's ApcStateIndex field is set to AttachedApcEnvironment, indicating the thread is executing in a different process context. Later, the ApcStatePointer array content is updated to reflect the new state, having its first element point to SavedApcState and its second element to ApcState, indicating that the APC environment for the thread's owning-process context is now located at SavedApcState, and the one for the thread's new process context at ApcState. Finally, the current process context is switched for that of the new process.

What determines the target APC environment for an APC object is the ApcStateIndex field. The value of ApcStateIndex is taken as an index into ApcStatePointer to obtain a pointer to the target APC environment. Later, this pointer is used to place the APC object in its corresponding queue within the APC environment.

When the thread is detaching from the new process (KeDetachProcess), any pending kernel APCs waiting to execute in the new process address space are delivered. Next, the content of SavedApcState is copied back to ApcState, SavedApcState is cleaned up, the thread's ApcStateIndex field is set to OriginalApcEnvironment, and ApcStatePointer is also updated; then the current process context is switched for that of the thread's owning process.

Using APCs

Device drivers use two major functions to utilize APCs. The first, KeInitializeApc (see Listing 2), is used to initialize an APC object. This function takes a driver-allocated APC object, a pointer to the target thread object, the APC environment index (which APC environment to place the APC object in), the APC's kernel, rundown and normal routines pointers, the kind of APC (user mode or kernel mode), and a context parameter.

KeInitializeApc first sets the Type and Size fields to the appropriate values for an APC object. Then it checks the value of the Environment argument. If it is CurrentApcEnvironment, the ApcStateIndex field is set to the target thread's ApcStateIndex field; otherwise, the ApcStateIndex field is set to the value of Environment. Next, this function sets the Thread, RundownRoutine, and KernelRoutine fields directly from the function arguments. To accurately determine the kind of APC, KeInitializeApc checks the value of the NormalRoutine parameter. If it is NULL, the ApcMode field is set to KernelMode and NormalContext is set to NULL. If NormalRoutine is nonNULL, in which case it must point to a valid routine, the ApcMode and NormalContext fields are obtained from the function arguments. Finally, KeInitializeApc sets the Inserted field to FALSE, indicating the APC object has not been placed in its corresponding APC queue yet.

From this explanation, you can easily realize that APC objects missing a valid NormalRoutine are considered as kernel APCs. Specifically, they are considered as special kernel-mode APCs. Indeed, this is the kind of APC the I/O Manager uses to perform asynchronous I/O completion. Conversely, APC objects that define a valid NormalRoutine are considered regular kernel-mode APCs, given that ApcMode is KernelMode; otherwise, they are considered user-mode APCs. The prototypes for KernelRoutine, RundownRoutine, and NormalRoutine are defined in NTDDK.H as shown in Listing 3.

In general, every APC object must contain a valid KernelRoutine function pointer, whatever its kind. This driver-defined routine will be the first one to run when the APC is successfully delivered and executed by the NT's APC dispatcher. User-mode APCs must also contain a valid NormalRoutine function pointer, which must reside in user memory. Likewise, regular kernel-mode APCs contain a valid NormalRoutine, which runs in kernel mode just like KernelRoutine. Optionally, either kind of APC may define a valid RundownRoutine. This routine must reside in kernel memory and is only called when the system needs to discard the contents of the APC queues, such as when the thread exits. In this case, neither KernelRoutine nor NormalRoutine are executed, just the RundownRoutine. An APC without such a routine will be deleted.

Keep in mind that the action of delivering APCs to a thread only involves calling the APC dispatcher subroutine KiDeliverApc at operating system well-defined points, while executing an APC involves actually calling the APC routines.

Once the APC object is completely initialized, device drivers typically call KeInsertQueueApc (see Listing 4) to place the APC object in the target thread's corresponding APC queue. This function takes a pointer to the APC object initialized with KeInitializeApc, two system arguments, and a priority increment. As well as the context parameter passed to KeInitializeApc, these system arguments are simply passed to the APC's routines when they are executed (see Listing 3).

Before KeInsertQueueApc places the APC object in the target thread's corresponding APC queue, it first checks whether the target thread is APC queueable. If it isn't, the function returns immediately with a FALSE result. If APCs can be queued to the thread, the function sets the SystemArgument1 and SystemArgument2 fields directly from the function arguments. Next, the function calls KiInsertQueueApc to actually place the APC object in its corresponding APC queue.

KiInsertQueueApc only takes an APC object and a priority increment, which is applied in case the APC causes the target thread's wait state to be satisfied. This function first obtains the thread's APC queue spinlock and holds it while it is running to prevent other threads from simultaneously modifying the thread's APC structures (note that on uniprocessor systems, KiAcquireSpinLock just returns to the caller). Next, it checks the Inserted field. If it is TRUE, this indicates the APC object has already been placed in one APC queue and the function returns immediately with a FALSE result. If it is FALSE, the function determines the target APC environment from the ApcStateIndex field as explained earlier and proceeds to place the APC object in its corresponding queue by linking it through the ApcListEntry field. The position in which an APC object is placed is determined by the kind of APC. Specifically, regular kernel-mode and user-mode APCs are placed at the end of their corresponding APC queues. Conversely, a special kernel-mode APC is placed at the front of its particular queue before the first regular kernel-mode APC, if there is any already in the queue. If the APC is a kernel-defined user APC used when a thread exits, it's also placed at the front of its corresponding queue and the thread's main APC environment's UserApcPending control variable is set to TRUE. Now, KiInsertQueueApc sets the Inserted field to TRUE, indicating the APC object is already placed in its corresponding queue. Next, a check is performed to see whether the APC was queued to the APC environment for the thread's current process context. If not, the function immediately returns with a TRUE result. If this is a kernel APC (whatever its kind), the KernelApcPending control variable in the thread's main APC environment is set to TRUE.

In the couple-of-sentences allusion to APCs in the Win32 SDK documentation, it's stated that after an APC has been successfully placed in its queue, a software interrupt is issued and the APC is executed the next time the thread is scheduled to run. However, this is not entirely true. Such a software interrupt is only issued if the APC is directed at the calling thread and it's a kernel-mode APC, whether regular or special. Later the function returns with a TRUE result. If the APC is not directed at the calling thread, the target thread is in a wait state at Passive Level; this is a regular kernel-mode APC; the thread is not inside a critical region; and no other regular kernel-mode APC is still in progress, then the thread is awakened with the completion status STATUS_KERNEL_APC, but the wait state is not aborted. If this is a user-mode APC, KiInsertQueueApc checks to see whether the target thread is in an alertable wait state with WaitMode equal to UserMode. If it is, the main APC state's UserApcPending control variable is set to TRUE and the wait state is satisfied with the completion status STATUS_USER_APC. Finally, the function releases the spinlock and returns a TRUE result to indicate the APC object has been successfully queued.

As a supplement to the APC management functions described earlier, device drivers can use the undocumented system service NtQueueApcThread (see Listing 5) to directly queue a user-mode APC to a particular thread. Internally, this function calls KeInitializeApc and KeInsertQueueApc to accomplish this task.

NT's APC Dispatcher

At well-defined points, NT checks whether a thread has pending APCs. Then, the APC dispatcher subroutine KiDeliverApc is executed in the thread's context to initiate APC delivery to the thread. Note that this behavior interrupts the thread's normal execution flow, giving control to the APC dispatcher first and later resuming the thread execution when KiDeliverApc completes.

For example, whenever a thread is scheduled to run, the last step of the context swapping function SwapContext is to inspect whether the new thread has "pending kernel APCs." If so, SwapContext either (1) requests an APC Level software interrupt to initiate APC delivery as soon as the new thread runs at low IRQL (Passive Level) or (2) returns with a TRUE result indicating that the new thread has "pending kernel APCs." This decision is based upon the IRQL at which the new thread will ultimately run when control passes to its restored program counter. If it is higher than Passive Level, SwapContext makes decision (1), and if it is Passive Level, the function makes decision (2).

The return value of SwapContext is only usable by certain system functions that explicitly call SwapContext to force a context switch to another thread. Then, when these system functions are resumed at some later time (when they are rescheduled again), they usually check the return value of SwapContext and if it is TRUE, they call the APC dispatcher to deliver kernel APCs to the current thread. For example, the system function KiSwapThread is used by wait services to relinquish the processor until the wait is satisfied. This function calls SwapContext internally and, when its execution is resumed at the point after the call to SwapContext (when the wait is satisfied), a check is performed on SwapContext's return value. If it is TRUE, KiSwapThread lowers the IRQL to APC Level and calls KiDeliverApc to deliver kernel APCs to the current thread.

For user APCs, the kernel invokes the APC dispatcher only "whenever" a thread is returning to user mode and the thread's main APC environment's UserApcPending control variable is TRUE. For example, when the system service dispatcher KiSystemService is about to return to user mode after completing a system service request, it checks whether there are pending user APCs. If so, it raises the IRQL to APC Level and invokes the APC dispatcher, indicating it should deliver any pending user APCs. Upon execution, KiDeliverApc calls this user APC's KernelRoutine. Later, the helper function KiInitializeUserApc is called to set up the thread's trap frame so that on exit from kernel mode, the thread starts executing in the user-mode APC dispatcher subroutine KiUserApcDispatcher in Ntdll.dll. The helper function's job is to copy the thread's previous execution state (which is stored in the trap frame created in the thread's kernel stack when it entered kernel mode) to the thread's user-mode stack, as well as the APC's normal routine address, normal context, and both system arguments, modifying the trap frame's ESP register accordingly. Finally, it loads the trap frame's EIP register with the address of KiUserApcDispatcher in Ntdll.dll. When the trap frame is eventually dismissed and the kernel transfers control to KiUserApcDispatcher, this function invokes the APC's NormalRoutine, which address and arguments reside in the stack and, when the routine completes, it calls NtContinue to resume the thread execution as if nothing had happened using the previous execution context also in the stack.

Listing 6 shows the pseudocode for the NT's APC dispatcher subroutine KiDeliverApc. As you can observe, its implementation is quite trivial. When the kernel invokes KiDeliverApc to deliver a user-mode APC, PreviousMode is passed as UserMode. TrapFrame points to the thread's trap frame and when it's invoked for kernel APCs, PreviousMode is passed as KernelMode and TrapFrame as NULL.

Note that whenever KernelRoutine is called (whatever the kind of APC), the pointers passed to it are from local copies of the APC's attributes and, because the APC object has already been pulled off its queue, it's safe to free the memory allocated for the APC in KernelRoutine. Besides, this routine has a last chance to modify its arguments before they are passed on to other routines.

Conclusion

APCs provide a very useful mechanism to execute code asynchronously in a particular thread context. As a device driver writer, you can rely on APCs to execute a routine in a particular thread context without that thread's intervention or consent whenever no guarantee of its address space's availability can be made. In this case, special kernel-mode APCs should be used given the restrictions imposed on regular kernel-mode APC delivery. For user application programmers, user-mode APCs can effectively be used to implement some sort of callback notification mechanism.

[email protected]