Sometimes, the Hopper’s development is a little bit… tricky 😊

Some parts of the UI is written in Swift, and I recently decided to test Swift 4. There is a lot of interesting things in this latest version of the language, and I wanted to give it a try.

The migration process has not been very difficult, it only took me a few hours. After some tests, I decided to keep this version, and continue the development in Swift 4.

Then, multiple versions of the Swift runtime libraries has been published by Apple, and one of them was broken on Mavericks… After all, this is still a beta software…

So, after a few weeks, I received a message from a customer, saying that the latest version of Hopper v4 was not running on Mavericks anymore. This was a very bad news, because I tested the application on 10.10+, but forgot 10.9. The crash was about a missing symbol in the embedded Swift libraries… how could that be possible, Swift is supposed to be compatible with macOS 10.9+! The libraries were making use of the “object_isClass” method, which is not available on Mavericks…

Fortunately enough, the Swift libraries are open source, and I quickly found where it was used (https://github.com/apple/swift/commit/79e33808f1c3afc8dfe8d2dd9f8bb6cadf0d243a)

At this time, there were three solutions:

return to Swift 3, and wait for the Swift 4 libraries to be fixed,

fix the function in the Swift libraries, recompile it, and ship it with Hopper,

do some kind of voodoo magic to fix the issue in the funniest possible way

I chose the latest solution… 😬

Injecting the Missing Code

Method swizzling is a well-known technic in Objective-C. It makes use of the runtime to swap method implementations, or inject new methods. But here, the missing method is not part of an Objective-C class, but just a plain C method. How could I achieve a similar result?

Let’s explain how macOS glues the dylibs you are using to your code…

Let’s say you are using the printf method from libSystem.

#include <stdio.h> int main (int argc, char const *argv[]) { printf("Hello world

"); return 0; } 1 2 3 4 5 6 7 #include <stdio.h> int main ( int argc , char const * argv [ ] ) { printf ( "Hello world

" ) ; return 0 ; }

If you compile this tiny program, the resulting executable will have a load command which will ask the dynamic linker (dyld) to load libSystem.B.dylib, and some additional information to bind the printf symbol to some address in the program.

$ clang hello.c $ otool -l a.out … Load command 12 cmd LC_LOAD_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 2 Thu Jan 1 01:00:02 1970 current version 1252.0.0 compatibility version 1.0.0 … $ xcrun dyldinfo -lazy_bind a.out lazy binding information (from lazy_bind part of dyld info): segment section address index dylib symbol __DATA __la_symbol_ptr 0x100001010 0x0000 libSystem _printf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $ clang hello.c $ otool -l a.out … Load command 12 cmd LC_LOAD_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 2 Thu Jan 1 01:00:02 1970 current version 1252.0.0 compatibility version 1.0.0 … $ xcrun dyldinfo -lazy_bind a.out lazy binding information (from lazy_bind part of dyld info): segment section address index dylib symbol __DATA __la_symbol_ptr 0x100001010 0x0000 libSystem _printf

The important thing here is that most of the methods you’ll use are lazily bound. It means that instead of resolving the address of every method at loading time, the linker will write the address of a resolution method, and the effective resolution will be made the first time the method is used.

Let’s take a look at the compiled program in Hopper. First, if you want to reproduce the steps, please uncheck the “Resolve Lazy Bindings”, otherwise Hopper will resolve all the steps for you, and you’ll only get the final state as if all the methods were already called and resolved.

After some prolog to set up the stack, you can find the part of the code initializing the arguments, and right after that, the “call printf” instruction.

If you double-click on the imp__stubs__printf symbol, you’ll find a jmp instruction, which loads the address of the printf method, and jump where the printf’s code should be located.

The printf_ptr symbol resides in the __la_symbol_ptr section, where all the lazy bound pointers are stored.

But, at this time, instead of a pointer to the printf code, there is a pointer to some weird code in the __stub_helper section:

It pushes a value on stack (here, the value “0”, which is the index of the symbol to be bound, and that you can find in the output of the dyldinfo command), and call a method located at 0x100000f8c, which in turn call the dyld_stub_binder method. This last method will resolve the address of the printf method, and replace the pointer at 0x100001010 with the “printf” address. So, the second time you call printf, the “jmp [printf_ptr]” instruction will directly jump to the printf code (if you haven’t unchecked the “Resolve Lazy Binding” option in Hopper, the address at 0x100001010 is already a pointer to printf).

It means that if a method is missing, the program will not crash at loading time, but the first time the method is used, and it gives us a chance to patch the executable at runtime before the crashing code is called!

Doing the Lazy Binding by Ourselves…

So, if a symbol is missing, how about resolving it by ourselves?

This is simpler than it sounds, and here is how we’ll proceed: we’ll parse the Mach-O header of the library, find where the lazy binding information is stored, find the symbol table, and replace the pointer in the __la_symbols_ptr section with our replacement code.

In the case of Hopper, the missing symbol is object_isClass. This is a function which appeared in Yosemite. It is located in the libobjc.A.dylib library. The code which makes use of this function is located in libswiftCore.dylib.

In order to make sure that the symbol is really missing (i.e. we are running on Mavericks), and that I need to inject my own version of the code, I’m proceeding to a small test:

if (dlsym(dlopen("/usr/lib/libobjc.A.dylib", RTLD_NOW), "object_isClass") == NULL) { … } 1 2 3 4 5 if ( dlsym ( dlopen ( "/usr/lib/libobjc.A.dylib" , RTLD_NOW ) , "object_isClass" ) == NULL ) { … }

Now, we need to find where the libswiftCore.dylib library has been loaded, and where is located its Mach-O header (I removed much of the sanity check, to make the code smaller in this post):

void *libHandle = dlopen("@rpath/libswiftCore.dylib", RTLD_NOW); void *sym = dlsym(libHandle, "swift_getObjectType"); struct dl_info dli; dladdr(sym, &dli); struct mach_header_64 *header = (struct mach_header_64 *) dli.dli_fbase; 1 2 3 4 5 6 7 8 void * libHandle = dlopen ( "@rpath/libswiftCore.dylib" , RTLD_NOW ) ; void * sym = dlsym ( libHandle , "swift_getObjectType" ) ; struct dl_info dli ; dladdr ( sym , & dli ) ; struct mach_header_64 * header = ( struct mach_header_64 * ) dli . dli_fbase ;

The idea is to load the library using dlopen, look for a method we know to be present in the executable, and use “dladdr” to find the base address of the library, where its Mach-O header is loaded.

Now, we scan all the load commands, and search for the command which defines the __la_symbol_ptr section (remember, this is the section where all the lazy pointers are stored). We also use the same loop to find the LINKEDIT segment, which holds the information about lazy binding.

uint64_t linkedit_addr = 0; uint64_t linkedit_offset = 0; uint64_t *la_ptrs = NULL; uint32_t firstIndirectIdx = 0; uint32_t nbPointers = 0; struct load_command *lc = (struct load_command *) (header + 1); for (uint32_t i=0; i<header->ncmds; i++) { struct load_command *next_lc = (struct load_command *) ((uintptr_t) lc + lc->cmdsize); if (lc->cmd == LC_SEGMENT_64) { struct segment_command_64 *segCmd = (struct segment_command_64 *) lc; // Is this the DATA segment? if (strcmp(segCmd->segname, "__DATA") == 0) { struct section_64 *sects = (struct section_64 *) (segCmd + 1); for (uint32_t j=0; j<segCmd->nsects; j++) { if (strcmp(sects[j].sectname, "__la_symbol_ptr") == 0) { la_ptrs = (uint64_t *) ((uintptr_t) header + sects[j].addr); firstIndirectIdx = sects[j].reserved1; nbPointers = (uint32_t) (sects[j].size / sizeof(uint64_t)); break; } } } // Keep information about the LINKEDIT segment if (strcmp(segCmd->segname, "__LINKEDIT") == 0) { linkedit_addr = (uint64_t) header + segCmd->vmaddr; linkedit_offset = segCmd->fileoff; } } // Next command lc = next_lc; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 uint64_t linkedit_addr = 0 ; uint64_t linkedit_offset = 0 ; uint64_t * la_ptrs = NULL ; uint32_t firstIndirectIdx = 0 ; uint32_t nbPointers = 0 ; struct load_command * lc = ( struct load_command * ) ( header + 1 ) ; for ( uint32 _ t i = 0 ; i < header -> ncmds ; i ++ ) { struct load_command * next_lc = ( struct load_command * ) ( ( uintptr_t ) lc + lc -> cmdsize ) ; if ( lc -> cmd == LC_SEGMENT_64 ) { struct segment_command_64 * segCmd = ( struct segment_command_64 * ) lc ; // Is this the DATA segment? if ( strcmp ( segCmd -> segname , "__DATA" ) == 0 ) { struct section_64 * sects = ( struct section_64 * ) ( segCmd + 1 ) ; for ( uint32 _ t j = 0 ; j < segCmd -> nsects ; j ++ ) { if ( strcmp ( sects [ j ] . sectname , "__la_symbol_ptr" ) == 0 ) { la_ptrs = ( uint64_t * ) ( ( uintptr_t ) header + sects [ j ] . addr ) ; firstIndirectIdx = sects [ j ] . reserved1 ; nbPointers = ( uint32_t ) ( sects [ j ] . size / sizeof ( uint64_t ) ) ; break ; } } } // Keep information about the LINKEDIT segment if ( strcmp ( segCmd -> segname , "__LINKEDIT" ) == 0 ) { linkedit_addr = ( uint64_t ) header + segCmd -> vmaddr ; linkedit_offset = segCmd -> fileoff ; } } // Next command lc = next_lc ; }

The lazy symbol resolution is a little bit tricky. In the executable, there is a list of symbols, declared by the LC_SYMTAB command. There are various kinds of symbols, used for different purposes. In the declaration of the __la_symbol_ptr section, the “reserved1” field holds an index into the indirect symbol table of the first symbol which is to be loaded in the section. The number of pointers in the section can be computed by dividing the size of the section, by the size of a pointer. So, if the reserved1 value of the section is N, and its size is S bytes long, the dylinker will eventually load the symbols whose indexes are N to N + S / sizeof(uintptr_t) – 1 in the indirect symbol table. Be careful, these are not indexes in the symbol table, but rather, in another table that one can find in the LINKEDIT segment, and whose indexes are indexes in the symbol table… yes, this is a little bit complicated… Here is a summary:

So, our next step is to find in the symbol table, what the index of the missing symbol is, and then, its index in the __la_symbol_ptr section.

During the scan of the load command, we can add this code:

uint32_t symIdx = 0; … if (lc->cmd == LC_SYMTAB) { struct symtab_command *symTab = (struct symtab_command *) lc; const struct nlist_64 *symbols = (const struct nlist_64 *) (linkedit_addr + symTab->symoff - linkedit_offset); const char *strings = (const char *) (linkedit_addr + symTab->stroff - linkedit_offset); for (uint32_t j=0; j < symTab->nsyms; j++) { const char *name = (const char *) strings + symbols[j].n_un.n_strx; if (strcmp(name, "_object_isClass") == 0) { symIdx = j; break; } } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 uint32_t symIdx = 0 ; … if ( lc -> cmd == LC_SYMTAB ) { struct symtab_command * symTab = ( struct symtab_command * ) lc ; const struct nlist_64 * symbols = ( const struct nlist_64 * ) ( linkedit_addr + symTab -> symoff - linkedit_offset ) ; const char * strings = ( const char * ) ( linkedit_addr + symTab -> stroff - linkedit_offset ) ; for ( uint32 _ t j = 0 ; j < symTab -> nsyms ; j ++ ) { const char * name = ( const char * ) strings + symbols [ j ] . n_un . n_strx ; if ( strcmp ( name , "_object_isClass" ) == 0 ) { symIdx = j ; break ; } } }

At the exit of the loop, the symIdx variable contains the index in the symbol table of the symbol we are looking for. Now, let’s find it into the indirect table:

if (lc->cmd == LC_DYSYMTAB) { struct dysymtab_command *symtab = (struct dysymtab_command *) lc; uint32_t *indirect = (uint32_t *) (linkedit_addr + symtab->indirectsymoff - linkedit_offset); for (uint32_t j=firstIndirectIdx; j < firstIndirectIdx + nbPointers; j++) { if (indirect[j] == symIdx) { // Symbol has been found in the indirect table // at index "j" } } } 1 2 3 4 5 6 7 8 9 10 11 12 if ( lc -> cmd == LC_DYSYMTAB ) { struct dysymtab_command * symtab = ( struct dysymtab_command * ) lc ; uint32_t * indirect = ( uint32_t * ) ( linkedit_addr + symtab -> indirectsymoff - linkedit_offset ) ; for ( uint32 _ t j = firstIndirectIdx ; j < firstIndirectIdx + nbPointers ; j ++ ) { if ( indirect [ j ] == symIdx ) { // Symbol has been found in the indirect table // at index "j" } } }

We now have the index, in the indirect table, of the missing symbol (in the “j” variable). We know that the __la_symbol_ptr section contains symbols starting at the offset reserved1 in the indirect table, and this index has been saved in the firstIndirectIdx variable when we found the section in the LC_SEGMENT_64 command.

So, we know the final address in the section of the missing symbol, which is:

la_ptrs[j - firstIndirectIdx] 1 la_ptrs [ j - firstIndirectIdx ]

Currently, this address contains a pointer to the dyld stub. We just need to replace it with some code. In my case, I decided to dump the missing function from the Yosemite version of the libobjc dylib:

static unsigned char static_code[] = { 0x55, 0x48, 0x89, 0xE5, 0x48, 0x85, 0xFF, 0x74, 0x24, 0x40, 0xF6, 0xC7, 0x01, 0x75, 0x1E, 0x48, 0xB8, 0xF8, 0xFF, 0xFF, 0xFF, 0xFF, 0x7F, 0x00, 0x00, 0x48, 0x8B, 0x0F, 0x48, 0x21, 0xC1, 0x48, 0x23, 0x41, 0x20, 0x48, 0x8B, 0x40, 0x08, 0x8A, 0x00, 0x24, 0x01, 0xEB, 0x02, 0x31, 0xC0, 0x0F, 0xB6, 0xC0, 0x5D, 0xC3 }; void *code = (void *) valloc(PAGE_SIZE); memcpy(code, static_code, sizeof(static_code)); mprotect(code, PAGE_SIZE, PROT_EXEC); la_ptrs[j - firstIndirectIdx] = (uint64_t) code; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static unsigned char static_code [ ] = { 0x55 , 0x48 , 0x89 , 0xE5 , 0x48 , 0x85 , 0xFF , 0x74 , 0x24 , 0x40 , 0xF6 , 0xC7 , 0x01 , 0x75 , 0x1E , 0x48 , 0xB8 , 0xF8 , 0xFF , 0xFF , 0xFF , 0xFF , 0x7F , 0x00 , 0x00 , 0x48 , 0x8B , 0x0F , 0x48 , 0x21 , 0xC1 , 0x48 , 0x23 , 0x41 , 0x20 , 0x48 , 0x8B , 0x40 , 0x08 , 0x8A , 0x00 , 0x24 , 0x01 , 0xEB , 0x02 , 0x31 , 0xC0 , 0x0F , 0xB6 , 0xC0 , 0x5D , 0xC3 } ; void * code = ( void * ) valloc ( PAGE_SIZE ) ; memcpy ( code , static_code , sizeof ( static_code ) ) ; mprotect ( code , PAGE_SIZE , PROT_EXEC ) ; la_ptrs [ j - firstIndirectIdx ] = ( uint64_t ) code ;

As the pointer has been resolved, and points to some regular code, the dynamic linker will be skipped for this symbol, and our replacement code will be executed instead.

Here is the final version: