Ever since shofEL2 was released earlier this year it’s been interesting to watch how different custom firmwares have tackled the prospect of modifying Nintendo’s firmware for both homebrew and piracy applications, and as someone who hasn’t really had much stake in that race I feel like it’s interesting to watch how different solutions tackle different problems, but at the same time since I do have a stake in a few places (namely, Smash Bros modding, vulnerability hunting, personal projects) I ended up in a situation where I needed to sort of ‘set up camp’ with Nintendo’s Horizon microkernel and have a comfortable working environment for modification.



Binary Patching is Weird, and Horizon makes it weirder.

Probably the biggest difficulty in Switch development I would say is iteration time, followed by a general difficulty in actually modifying anything; even just booting modified core services (ie filesystem management, service management, spl which talks to the EL0 TrustZone SPL [commonly misnomered as the security processer liaison…?], the boot service which shows boot logos and battery indications, …) requires, at a minimum, reimplementing Nintendo’s package1 implementation which boots TrustZone and patches for TrustZone to disable signatures on those services and kernel. Beyond the core services, modifying executables loaded from eMMC requires either patching Loader, patching FS, reimplementing Loader, or something else.

Unfortunately with binary patching there generally isn’t a silver bullet for things, generally speaking the three methods of modifications are userland replacement, userland patching, and kernel patching. The first two are currently used for Atmosphere, but the solution I felt would be the most robust and extensible for the Nintendo Switch was kernel patching. Here’s a quick rundown on the pros and cons for each method:

Userland Replacement

- Requires rewriting an entire functionally identical executable

- Often not feasible for larger services such as FS

- Can easily break between firmware updates, especially if new functionality is added or services split. This makes it difficult to maintain when the OS is in active development.

- Added processes can potentially leave detectable differences in execution (different PIDs, different order of init, etc)

+ Easier to add functionality, since you control all code

+ Can operate easily on multiple firmwares

+ Can serve as an open-source reference for closed-source code

Userland Patching

- Adding additional code and functionality can be difficult, since expanding/adding code pages isn’t always feasible without good control of Loader

- Finding good, searchable signatures can often be difficult

- Can easily break between firmware updates, especially if functionality or compilers are tweaked

+ With good signatures, can withstand firmware updates which add functionality

+ Often has less maintenance between updates when functionality does change; patching is usually easier than writing new code

+ Harder to detect unless the application checks itself or others check patched applications

Kernel Patching

- Greater chance of literally anything going wrong (concurrency, cache issues, …), harder to debug than userland

- Minimal (formerly no) tooling for emulating the kernel, vs userland where Mephisto, yuzu, etc can offer assistance

- Can easily break between firmware updates, and is more difficult (but not impossible) to develop a one-size-fits-all-versions patch since kernel objects change often

- Easier to have adverse performance impacts with syscall hooks

+ Harder to detect modifications from userland; userland cannot read kernel and checking if kernel has tampered with execution state can be trickier

+ Updating kernel object structures can take less time than updating several rewritten services, since changes are generally only added/removed fields

+ Direct access to kernel objects makes more direct alterations easier (permission modification, handle injection, handle object swapping).

+ Direct access to hardware registers allows for UART printf regardless of initialization state and without IPC

+ Hooking for specific IPC commands avoids issues with userland functionality changes, and in most cases IPC commands moving to different IDs only disables functionality vs creating an unbootable system.

mooooooo, a barebones Tegra X1 emulator for Horizon

Obviously the largest hangup with kernel patching is debugging, the Switch has RAM to spare (unlike 3DS) and setting up an intercept for userland exceptions isn’t impossible to do by trial and error using SMC panics/UART and a lot of patience, but for ease of use and future research I really, really wanted an emulator to experiment with the Switch kernel. I ended up building a quick-n-dirty emulator in Unicorn, and with a few processes it works rather well but it still struggles with loading Nintendo’s core processes currently, but for a small and contained test environment (two processes talking to each other and kernel patches watching them), I would say I had reached my goal and it was enough to at least be able to work quickly and sanely on my intercept.

For the most part, the Switch Horizon microkernel doesn’t actually use much of the Tegra MMIO; it uses some of the more obvious ARM components like GIC for interrupts, and it also has a large initialization sequence for the memory controller, but as long as interrupts are functional, timers work, MC returns some correct values and SMC return values are all correct, it boots into userland without issue.

I actually found that emulating multiple cores in Unicorn actually isn’t all that difficult, provided you’re using a compiled language where uc_mem_map_ptr works. Rather than messing with threads, I opted for a round-robin scheduling scheme where I run each Unicorn instance for a set number of instructions, with memory being mapped and unmapped from the running core so that any cached data gets written out before the next core has its turn. A lot of modifications/cherry-picking to Unicorn did have to be made in order to properly support interrupts, certain coprocessor registers (ie core indexes), translation tables (for uc_mem_read/uc_mem_write, vaddr->paddr translation, and just in general there were some odd issues).

Patching Horizon for Syscall MiTM

With a decent environment for modifying kernel, the next issue really just became actually bootstrapping an SVC intercept. Figuring out where exception vectors are located isn’t difficult with the emulator handy, but really the issue becomes

1. Extra code has to be loaded and copied by TrustZone, along with kernel

2. New code needs to be placed in a safe location and then given memory mappings

3. Existing code needs to be modified with hooks pointing to the new code

To guide the kernel towards salvation I ended up hooking just before translation table addresses are written into coprocessor registers. This way, the payload can allocate pages and copy code from less-safe soon-to-be-condemned .bss memory for the bulk of the SVC interception code, set up those pages in the translation tables, and then patch the ARM64 SVC handler to actually jump to the new mapping. For ease of development, the mapping is given a constant address along with any hardware registers which it needs to access, as opposed to being randomized like the rest of the kernel.

In the end, patched the kernel executes as follows:

Since hashtagblessed is able to map UART, CAR and PINMUX registers into translation tables, getting communication from the right Joy-Con rail using existing BPMP-based drivers was fairly straightforward, and even without any source code to reference there’s a fairly basic driver in TrustZone. Between the transition from emulation to hardware however, I had kept an SMC to print information to the console, but I ultimately ended up using UART even in emulation. On hardware, I got by using UART-B (the right Joy-Con railed) for a while, but had to switch to UART-A (internal UART lines for debugging) due to GPIO issues once HOS tried to initialize Joy-Con.

Identifying IPC Packets, Accurate Results With Simple Tools

With therainsdowninafrica loaded, hooked in and blessed, the next step is actually being able to identify specific IPC requests sent through svcSendSyncRequest, and doing this requires getting our hands dirty with kernel objects. Userland is able to utilize different kernel objects indirectly through the use of handles and system calls. Each KProcess has a handle table which maps handles to the underlying object structures, so translating handles to KObjects is simply a matter of checking the table for a given handle passed to a syscall. To access the current KProcess object which has the handle table, we can use the per-core context stored in register X18 as of 5.0.0 (prior to Horizon implementing ASLR, it was stored in a hardcoded address per-CPU) and the handle table can be accessed through the current KProcess object. Printf debugging was extremely useful while figuring out exactly how KProcess has its fields laid out since the structure changed slightly between versions, and with a bit of reversing added in it’s not particularly difficult to figure out exactly where the KProcessHandleTable is at how handles are translated into objects.

Probably the most useful fields in KProcess/KThread, in our case, are the process name and title ID, the handle table, and the active thread’s local storage, where all IPC packets are read from and written to. To give a quick overview on how Switch IPC generally works, services are able to register a named port to wait for communications on which can be connected to by other processes via svcConnectToNamedPort. In practice, Nintendo only uses globally-accessible named ports for their service manager, `sm:`. On a successful call to svcConnectToNamedPort, processes recieve a KClientSession handle for sm: which allows those processes to send IPC requests to the service manager to either register/unregister/connect to 'private’ named ports managed by the service manager, with sm checking whether the requesting process actually has access to said service.

From a practicality standpoint, since so much communication relies on IPC the most obvious mechanism to establish is a system for hooking requests going into specific named ports, both globally accessible ones such as sm: and private ones provided by sm itself. This kinda leads into why it’s important to have access to the underlying KClientSession objects as opposed to trying to track handles; mapping out exactly which KProcess’ handles go to what, while also tracking where handles might be copied and moved to is an almost impossible task, however mapping specific KClientSessions to specific handlers avoids the handle copying/moving issue since the KClientSession object pointer does not change in those cases.

Additionally, many interfaces aren’t actually accessible from either svcConnectToNamedPort nor sm, as is the case with fsp-srv which gives out IFileSystem handles for different storage mediums. However, by providing a generic means for mapping KClientSession objects to specific intercept handlers, you can set up a chain of handlers registering handlers. For example, intercepting a specific eMMC partition’s IFile’s commands would involve setting up a handler for the sm global port, and then from that handler setting up a handler for any returned fsp-srv handles, and then from the fsp-srv handler checking for the OpenBisFileSystem command for a specific partition to hook the IFileSystem to a handler, which can have its OpenFile command hooked to hook any outgoing IFile handles to a specific handler for IFiles from that eMMC partition. From that point all incoming and outgoing data from that IFile’s KClientSession can be modified and tweaked.

Finally, in order to prevent issues with KProcess handle tables being exhausted, Nintendo provided virtual handle system implemented in userland for services which manage large amounts of sessions. Effectively, a central KClientSession is able to give out multiple virtual handles (with virtual handles given out by virtual interfaces) only accessible through that KClientSession handle. As such, a process can take a service such as fsp-srv and with a single handle can manage hundreds of virtual interfaces and sub-interfaces, easing handle table pressure on both the client and server ends. These handles can be accommodated for by watching for KClientSession->virtual handle conversion, and then keeping mappings for KClientSession-virtual ID pairs. And again, since copied/moved KClientSessions keep their same pointer, in the event that somehow the central handle and a bunch of domain IDs were copied to another process, they would still function correctly.

Tying it All Together

Let’s take a look at what it would take to boot homebrew via hbloader utilizing only SVC interception. The key interface of interest is fsp-ldr, which offers command 0 OpenCodeFileSystem taking a TID argument and an NCA path. From a userland replacement standpoint, booting homebrew involves redirecting the returned IFileSystem to be one from the SD card rather than one from fsp-ldr, since Loader (the process accessing fsp-ldr) doesn’t really do any authentication on NSOs/NPDMs, only FS does. From a kernel standpoint, we just need to watch for an IPC packet sent to fsp-ldr for that command, hook the resulting handle, and then for each OpenFile request check if an SD card file can better override it. From there, swap handles and now Loader is reading from an IFile on the SD card rather than an NCA.

Taking a few steps back, there’s obviously a few things to keep in mind: Loader never actually accesses the SD card, in fact it doesn’t even ask for a fsp-srv handle. Since it is a builtin service it has permissions for everything, but the issue still remains of actually making sure handles can be gotten and swapped in. As it turns out, however, calling SVC handlers from SVCs is more than possible, so if Loader sends a request to sm asking for fsp-ldr, we can send out a request for fsp-srv, initialize it, and then send out the original request without Loader even knowing.

Interestingly, the first thing Loader does with its fsp-ldr handle is it converts it into a virtual domain handle, so that all OpenCodeFileSystem handles are virtual handles. This does make working with it a little more tricky since the code filesystem and code files all operate under the same KClientSession object, but it was an issue which needed resolving anyhow. For SD card IFile sessions, it also means that we have to convert them to virtual handles and then swap both the file KClientSession handle and the file virtual handle ID, while also watching for their virtual ID to close so that we can close our handles at the same time and avoid leakage.

A few other tricks are required to properly emulate the SD redirection experience: swapping in handles isn’t the only concern, it’s also important to ensure that if the SD card *doesn’t* have a file then that error should be returned instead, and if the SD card has a file which doesn’t exist in the original IFileSystem, we still need a file handle to replace. To accomodate for this, the original FileOpen request is altered to always open “/main” and if the SD card errors, that virtual handle is closed, and otherwise the SD handles are swapped in.

The end result is, of course, the homebrew launcher being accessible off boot from the album applet:

Other Potential Thoughts for Kernel Stuff

* Process patching is as easy as hooking svcUnmapProcessMemory and patching the memory before it’s unmapped from Loader. Same goes for NROs but with different SVCs, all .text ultimately passes through kernel.

* Reply spoofing. IPC requests can simply return without ever calling the SVC, by having kernel write in a reply it wants the process to see.

* SVC additions. I’m not personally a fan of this because it starts to introduce ABIs specific to the custom firmware, but it’s possible. One of the things I’m not personally a fan of with Luma3DS was that they added a bunch of system calls in order to access things which, quite frankly, were better left managed in kernel. The kernel patches for fs_mitm also violate this. Userland processes shouldn’t be messing with handle information and translation tables, i m o. That’s hacky.

* Virtual services and handles. Since the intercept is able to spoof anything and everything a userland process knows, it can provide fake handles which can map to a service which lies entirely in kernel space.

* IPC augmentation: Since any IPC request can be hooked, it can be possible to insert requests inbetween expected ones. One interesting application might be watching for outgoing HID requests and then, on the fly, translating these requests to the protocol of another controller which also operates using HID.

* IPC forwarding: similar to augmentation, packets can be forwarded to a userland process to be better handled. Unfortunately, kernel presents a lot of concurrency issues which can get really weird, especially since calling SVC handlers can schedule more threads that will run through the same code.

* As currently implemented, A32 SVCs are not hooked, however this is really more an issue if you want to hook outgoing requests from A32 games like MK8, since services such as Loader will generally only operate in a 64-bit mode.

Source

Horizon emulator, https://github.com/shinyquagsire23/moooooooo

therainsdowninafrica, https://github.com/shinyquagsire23/therainsdowninafrica