Introduction

The Bug(s)

/dev/nvmap

Open a service session with nvdrv:a (a variation of nvdrv , available only to applets such as the browser).

(a variation of , available only to applets such as the browser). Call the IPC command Initialize which supplies memory allocated by the browser to the nvservices system module.

which supplies memory allocated by the browser to the system module. Call the IPC command Open on the /dev/nvmap interface.

on the interface. Submit ioctl commands by using the IPC command Ioctl .

. Close the interface with the IPC command Close.

The GM20B has 1 GPC ( Graphics Processing Cluster).

has 1 ( Each GPC has 2 TPC s ( Texture Processing Clusters or Thread Processing Clusters, depending on context).

has 2 s ( Each TPC has 2 SM s ( Streaming Multiprocessors) and each contains 8 processor cores.

Streaming Multiprocessors) and each contains 8 processor cores. Each SM can run up to 128 "warps".

can run up to 128 "warps". A "warp" is a group of 32 parallel threads that runs inside a SM.

The Exploit

The Aftermath

Scan nvservices' memory space using svcQueryMemory and find all memory blocks with the attribute IsDeviceMapped set. Odds are that this block is mapped to an I/O device and thus the GPU.

Locate the 0xFACE magic inside these blocks. This magic word is used as a signature for GPU channels so, if we find it, we found a GPU channel structure.

Find the GPU channel's page table. Each GPU channel contains a page directory that we can traverse to locate and replace page table entries. Remember that these entries are used by the GMMU and not the SMMU which means not only they follow a different format but also that the addresses used by the GMMU represent virtual GPU memory addresses.

Patch GPU channel's page table entries with any memory address we want. The important part here is setting bit 31 in each page table entry as this tells the GMMU to access physical memory directly instead of going through the SMMU as usual.

Get the GPU to access our target memory address via GMMU. I decided to use a rather obscure engine inside the GPU that envytools/nouveau calls the PEEPHOLE (see https://envytools.readthedocs.io/en/latest/hw/memory/peephole.html). This engine can be programmed by poking some GPU MMIO registers and provides a small, single word, window to read/write virtual memory covered by a particular GPU channel. Since we control the channel's page table and we've set bit 31 on each entry, any read or write going through the PEEPHOLE engine will access any DRAM address we want!

Use smhax to load the creport system module in waiting state.

to load the system module in waiting state. Use gmmuhax to find and patch creport directly in DRAM.

to find and patch directly in DRAM. Launch a patched creport system module that would call svcDebugActiveProcess , svcGetDebugEvent and svcReadDebugProcessMemory with arguments controlled by us.

system module that would call , and with arguments controlled by us. Use the debug SVCs to dump all running built-in system modules.

The Fixes

Conclusion

Welcome to a new write-up! Almost a year later after my last one on the Wii U...Now, as we are getting close to the dawn of a new year, I will finally present one of the most fun exploit chains I ever had the privilege to discover and develop.As you know, the Switch has seen many developments on the hacking front since its release and I'm proud to have taken part in a large number of them alongsideand many others.But before we reached the comfortable plateau we are in now, there were many complex attempts to defeat the Switch's security and today we will be looking into one of the earliest successful attempts to exploit the system from a bottom-up approach: theexploit.To fully understand the context of this write-up we must go back to April 2017, a mere month after the Switch's international release.Back then, everybody was trying to push the limits of the not-so-secretthat came bundled with the Switch's OS. You may recall that this browser was vulnerable to a public and very popular bug known as CVE-2016-4657, notorious for being part of the Pegasus exploit chain. This allowed us to take over the browser's process with minimal effort in less than a week of the Switch's release.The next logical step would be to escalate outside the browser's sandbox and attempt to exploit other, more privileged, parts of the system and this is how our story begins...Exploiting the browser became a very pleasant task after the release of https://github.com/reswitched/pegaswitch ), a JS based toolkit that leverages vulnerabilities in the Switch's browser to achieve complex ROP chains.Shortly after dumping the browser's binary from memory, documentation on the IPC system began to take form on thewiki ( https://switchbrew.org/wiki/Main_Page ).Before the generic IPC marshalling system now implemented in, plenty of us began writing our own ways of talking with the Switch. One such example is in https://github.com/xyzz/rop-rpc ), another toolkit designed for running ROP chains on the Switch's browser.I decided to write my own functions on top of the very first release ofas you can see below (please note that this is severely outdated):Using this and the "bridge" system ('s way of calling arbitrary functions within the browser's memory space) I could now talk with other services accessible to the browser.Since this was beforewas discovered, I didn't know I could just bypass the restrictions imposed by the browser's NPDM so I focused exclusively on the services that the browser itself would normally use.From these,immediately caught my attention due to the large amount of symbols left inside the browser's binary which, in turn, made black box analysis of thesystem module much easier. This also allowed me to document everything on thewiki fairly quickly (see https://switchbrew.org/wiki/NV_services ).Thesystem module provides a high level interface for the GPU (and a few other engines), abstracting away all the low level stuff that a regular application doesn't need to bother with. Its most important part is theservice family which, as the name implies, provide a communication channel for the NVIDIA drivers inside thesystem module.You can easily see the parallelism with the L4T's () source code but, for obvious reasons, in the Switch's OS the graphics drivers are isolated in a system module instead of being implemented in the kernel.So, with a combination of reverse engineering and studying Tegra's source code I could steadily document thecommand interface and, more importantly, how to reach the ioctl system that the driver revolves around. There are many ioctl commands for each device interface so this sounded like the perfect attack surface for exploitingOver several weeks I did nothing but fuzz as much ioctl commands as I could reach and, eventually, I found the bugs that would form the core of what would become theexploit.The very first bug I found was in thedevice interface. This interface's purpose is to provide a way for creating and managing memory containers that serve as backing memory for many other parts of the GPU system.From the browser's perspective, accessing this device interface consists in the following steps:Since we are hijacking the browser mid-execution, a service session withis already created and theIPC command has already been invoked. After finding the service handle for this session we can simply sendandcommands to any interface we like.In this case, while messing around with theinterface I found a bug in theioctl command that would leak back a memory pointer from thememory space:I later found out that a few others had also stumbled upon this bug, so I stashed it away for a while.A few days later I was messing around with theinterface and got a weird crash in one if its ioctl commands:Reverse engineering the browser's code revealed that this ioctl was indeed present, but no code path could be taken to call it under normal circumstances. Furthermore, I was able to observe that this particular ioctl command would only take a struct with a single u64 as its argument.After finding it in the Tegra's source code (see https://github.com/arter97/android_kernel_nvidia_shieldtablet/blob/master/include/uapi/linux/nvgpu.h#L315 ) I was able to deduce that it was expecting astruct which contains a single field: an u64Turns out,is a pointer to astruct which contains 3 u64 fields:andWithout going into much detail on how the(Tegra X1's GPU) works:So, basically, this ioctl signals theto pause and tries to return information on the "warps" running on eachinsideis ignored).You can find it in Tegra's source code here: https://github.com/arter97/android_kernel_nvidia_shieldtablet/blob/master/drivers/gpu/nvgpu/gk20a/ctrl_gk20a.c#L398 As you can see, it builds astruct with the information and callsusingas the destination address.Since dealing directly with memory pointers from other processes is incompatible with the Switch's OS design, most ioctl commands were modified to take "in-line" arguments instead. However,was somehow forgotten and kept the original memory pointer based approach.What this means in practice is that we now have an ioctl command that is trying to copy data directly using a memory pointer provided by the browser! However, since any pointer we pass from the browser is only valid to the browser's memory space, we need to leak memory from thesystem module to turn this into something even remotely useful.Upon realizing this, I recalled the other bug I had found and tried to pass the pointer it leaked to. As expected, I no longer had a crash, instead the command completed successfully!Unfortunately,leaks a pointer to a memory container allocated byusing transfer memory. This means that, while the leaked address is valid in's memory space, it is impossible to find out where the actual system module's sections are located because transfer memory is also subjected to ASLR.At this point, I decided to share the bug withand we began discussing potential ways to use it for further exploitation.As I mentioned before, every time a session is initiated with theservice family, the client must call the IPC commandand this command requires the client to allocate and submit a kind of memory container that the Switch callsTransfer memory is allocated with the SVC 0x15 () which returns a handle that the client process can send over to other processes which in turn can use it to map that memory in their own memory space. When this is done, the memory range that backs up the transfer memory becomes inaccessible to the client process until the other process releases it.A few days pass andhas an idea: what if you destroy the service session withand dump the memory range that backs the transfer memory sent over with thecommand?And that's how theorbug was found. You can find a more detailed write-up on this bug from, who also found the same bug independently, here: https://daeken.svbtle.com/nintendo-switch-nvservices-info-leak With this new memory leak in hand I tried to blindly pass some pointers toand got mixed results (crashes, device interfaces not working anymore, etc.). But when I tried to pass the pointer leaked byand dump the transfer memory afterwards, I could see that some data had changed.As it turns out, the pointer leaked bybelongs to the transfer memory region and we can now use this to find out exactly what is being written byand where.As expected, a total of 48 bytes were being written which, if you recall, make up the total space used by 2structs (one for each). However, to my surprise, the contents had nothing to do with the "warps" and they kept changing on subsequent calls.That's right, thestructs were not initialized on the' side so now we have a 48 byte stack leak as well!While this may sound convenient, it ended up being a massive pain in the ass due to how unstable the stack contents could be. But, of course, when there's a will there's a way...Exploiting these bugs was very tricky...The first idea I came up with was to try coping with the unreliable contents of the semi-arbitrary write fromand just corrupt different objects that I could see inside the leaked transfer memory region. This had very limited results and led nowhere.Luckily, by now,andsucceeded in exploiting the Switch using a top-down approach: the famous glitch attack than went on to be presented at. So, with the actualbinary now in hand, we could finally plan a proper exploit chain.Working with, we found out that other ioctl commands could change the stack contents semi-predictably and we came up with this:As you can see, we use thebug as-is (due to the last byte being almost always 0) to overwrite theflag. This allows the browser to access the debug onlyanddevice interfaces and use previously inaccessible ioctl commands.This was particularly useful to gain access towhich could be used to plant a nice ROP chain inside the transfer memory. However, pivoting the stack still required some level of control over the stack contents for thebug to work.Many other similar methods can be used as well, but this always ended up with the same issue: thebug was just too unreliable.Some weeks pass andcomes up with an insane yet brilliant way of exploiting this:With a combination of mass memory allocations to manipulate the transfer memory's base address, some clever null byte writes and object overlaps we are now able to build very powerful read and write primitives using the transfer memory region and thus gain the ability to copy memory between the browser process and. Achieving ROP is now way easier and surprisingly stable with the exploit chain working practically 9 out of 10 times.By now,had already been discovered hence whyis being used in the code instead. However, this was purely experimental (to understand the different services' access levels) and is not a requirement. The exploit chain works without taking advantage ofor any other already patched bugs.We finally escaped the browser and have full control overso, what should we do next? What about take the entire system down? ;)You may recall from(see https://www.3dbrew.org/wiki/3DS_System_Flaws#Standalone_Sysmodules ) on the 3DS that GPUs are often a great place to look into when exploiting a system. Having this in mind since the beginning motivated me to attackin the first place and, fortunately, the Switch was no exception when it comes to properly secure a GPU.After's incredible work on maturing ROP for, I began looking into what could be done with theinside the Switch.A large amount of research took place over the following weeks, combining the publicly available Tegra's source code and TRM with theproject's code and my own reverse engineering of thesystem module. It was at this time that this incredibly enlightening quote from the Tegra X1's TRM was found:If you watched thetalk you probably remember this. If not, then I highly recommend at least re-reading the slides over here: https://switchbrew.github.io/34c3-slides/ I/O devices inside the Tegra X1's SoC are subjected to what ARM calls the(System Memory Management Unit). Theis simply a memory management unit that stands between a DMA capable input/output bus and the main memory chip. In the Tegra X1's and, consequentially, the Switch's case, it is thethat is responsible for translating accesses from the APB (Advanced Peripheral Bus) to the DRAM chips. By properly configuring the(Memory Controller) and locking out the's page table behind the kernel, this effectively prevents any peripheral device to access more memory than it should.Side note: on firmware version 1.0.0 it was actually possible to access the's MMIO region and thus completely disable the. This attack was dubbedand was presented at thebyandSo, we now know that the GPU has its own MMU (accordingly named) and that it is capable of bypassing entirely the. How do we even access it?There are many different ways to achieve this but, at the time and using the limited documentation available for theGPU family, this is what I came up with:After all this, we now have a way to dump the entire DRAM... well, sort of.An additional layer of protection is enforced at thelevel and that is the. These are physical memory ranges that can be completely isolated from direct memory access.Since I first implemented all this in firmware version 2.0.0, dumping the entire DRAM only gave me every non-built-in system module and applet that was loaded into memory. However,later tried it on 1.0.0 and realized we could dump the built-in system modules there!Turns out, while the kernel has always been protected by a generalized memory carveout, the built-in system modules were loaded outside of this carveout in firmware 1.0.0.Additionally, the kernel itself would start allocating memory outside of the carveout region if necessary. So, by exhausting some kernel resource (such as service handles) up to the point where kernel objects would start showing up outside of the carveout region,was able to corrupt an object and take over the kernel as well.From that point on, we could use(see https://www.reddit.com/r/SwitchHacks/comments/7rq0cu/jamais_vu_a_100_trustzone_code_execution_exploit/ ) to defeat theand later useto defeat the entire boot chain. But, that wasn't the end...I wanted to reach the built-in system modules on recent firmware versions as well, soand I cooked up a plan:This worked for getting all built-in system modules (exceptwhich only runs once and ends up overwritten in DRAM)! Naturally, we didn't know aboutback then so all this was moot.However, firmware version 5.0.0 fixedand suddenly all this was relevant again. We had to work around the fact thatwas no longer available which required some very convoluted tricks using DRAM access to hijack other system modules.To make matters worse, Switch units with new fuse patches for the well known RCM exploit were being shipped so the need forwas now very real.While theattack itself cannot be fixed (since it's an hardware flaw), some mitigations have been implemented such as creating a separate memory pool forAs for, all 3 bugs mentioned in this write-up have now been fixed in firmware versions 6.0.0 () and 6.2.0 (andbugs).The fix for theconsists on simply tracking the address and size of the transfer memory region insideand every time a service handle is closed, the entire region is cleared before becoming available to the client again.Thebug was fixed by changing its implementation to match every other ioctl command: have the command take "in-line" parameters instead of a memory pointer. The command now returns the 2structs directly which are now also properly initialized.As for thebug, it now takes into account the case where the client doesn't supply its own memory and prevents the' transfer memory pointer to be leaked back.This was definitely one of the most fun exploits I ever worked on, but the absolute best part was having the opportunity to develop it alongside such talented individuals likeand many others.Working on this was a blast and knowing how long it managed to remain unpatched was surprisingly amusing.As promised, an updated version of this exploit chain written for firmware versions 4.1.0, 5.x and 6.0.0 will be progressively merged into theproject.As usual, have fun!