This blog post introduces new features of LIEF as well as some uses cases.

Tl;DR: LIEF v0.8.3 is out. The main changelog is available here and packages can be downloaded on the official website.

To install the Python package:

To launch the fuzzer, one can run the following commands:

Thus, we integrated LibFuzzer in the project. Fuzzing the LIEF ELF, PE, Mach-O parser is as simple as:

Fuzzing our own library is a good way to detect bugs, memory leak, unsanitized inputs ...

The SDK package LIEF-0.8.3-Android_API21_armeabi-v7a.tar.gz is automatically pulled from the Docker to the current directory.

To cross compile LIEF for Android ARM, one can run:

Among Dockerfiles, we provide a Dockerfile to cross compile LIEF for Android ( ARM , AARCH64 , x86 , x86-64 )

To facilitate the compilation and the use of LIEF, we created the Dockerlief repo which includes various Dockerfiles as well as the dockerlief utility. dockerlief is basically a wrapper on docker build .

For tagged version, packages are uploaded on the Github release page: https://github.com/lief-project/LIEF/releases .

If tests succeeds packages are automatically uploaded on the https://github.com/lief-project/packages repository.

Each commits is tested on

We attach a great importance to the automation of some development tasks like testing, distributing, packaging, etc. Here is a summary of these processes:

ELF

Play with ELF symbols - Part 2 In the tutorial #03 we demonstrated how to swap dynamic symbols between a binary and a library. In this part, we will see how we can rename these symbols. Changing symbol names is not a trivial modification, since modifying the string table of the PT_DYNAMIC segment has side effects: It requires to update the hash table (GNU Hash / SYSV).

It usually requires to extend the DYNAMIC part of the ELF format. The previous version of LIEF already implements the rebuilding of the hash table but not the extending of the DYNAMIC part. With the v0.8.3 we can extend the DYNAMIC part. Therefore: We can add new entries in the .dynamic section

section We can change dynamic symbols names

We can change DT_RUNPATH and DT_RPATH without length restriction We will rename all imported functions of gpg that are imported from libgcrypt.so.20 into a_very_long_name_of_function_XX and all exported functions of libgcrypt.so.20 into the same name (XX is the symbol index). import lief # Load targets gpg = lief . parse ( "/usr/bin/gpg" ) libgcrypt = lief . parse ( "/usr/lib/libgcrypt.so.20" ) # Change names for idx , lsym in enumerate ( filter ( lambda e : e . exported , libgcrypt . dynamic_symbols )): new_name = 'a_very_long_name_of_function_{:d}' . format ( idx ) print ( "New name for '{}': {}" . format ( lsym . name , new_name )) for bsym in filter ( lambda e : e . name == lsym . name , gpg . dynamic_symbols ): bsym . name = new_name lsym . name = new_name # Write back binary . write ( gpg . name ) libgcrypt . write ( libgcrypt . name ) By using readelf we can check that function names have been modified: $ readelf -s ./gpg | grep "a_very_long_name" 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) ... $ readelf -s ./libgcrypt.so.20 | grep "a_very_long_name" 88: 000000000000d050 6 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 89: 000000000000dcd0 69 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 90: 000000000000d310 34 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 91: 000000000000de70 81 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 ... Now if we run the new gpg binary, we get the following error: $ ./gpg --output bar.txt --symmetric ./foo.txt relocation error: ./gpg: symbol a_very_long_name_of_function_8, version GCRYPT_1.6 not defined in file libgcrypt.so.20 with link time reference Because the Linux loader tries to resolve the function a_very_long_name_of_function_8 against /usr/lib/libgcrypt.so.20 and that library doesn't include the updated names we get the error. One way to fix this error is to set the environment variable LD_LIBRARY_PATH to the current directory: $ LD_LIBRARY_PATH = . ./gpg --output bar.txt --symmetric ./foo.txt $ xxd ./bar.txt | head -n1 00000000: 8c0d 0407 0302 c5af 9fba cab1 9545 ebd2 .............E.. $ LD_LIBRARY_PATH = . ./gpg --output foo_decrypted.txt --decrypt ./bar.txt $ xxd ./foo_decrypted.txt | head -n1 00000000: 4865 6c6c 6f20 576f 726c 640a Hello World. Another way to fix it is to add a new entry in .dynamic section. As mentioned at the beginning, we can now add new entries in the .dynamic so let's add a DT_RUNPATH entry with the $ORIGIN value so that the Linux loader resolves the modified libgcrypt.so.20 instead of the system one: ... # Add a DT_RUNPATH entry gpg += lief . ELF . DynamicEntryRunPath ( "$ORIGIN" ) # Write back binary . write ( gpg . name ) libgcrypt . write ( libgcrypt . name ) And we don't need the LD_LIBRARY_PATH anymore: $ readelf -d ./gpg | grep RUNPATH 0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN] $ ./gpg --decrypt ./bar.txt gpg: AES encrypted data gpg: encrypted with 1 passphrase Hello World

Hiding its symbols While IDA v7.0 has been released recently, among the changelog one can notice two changes: ELF: describe symbols using symtab from DYNAMIC section

ELF: IDA now uses the PHT by default instead of the SHT to load segments from ELF files These changes are partially true. Let's see what go wrong in IDA with the following snippet: id = lief . parse ( "/usr/bin/id" ) dynsym = id . get_section ( ".dynsym" ) dynsym . entry_size = dynsym . size // 2 id . write ( "id_test" ) This snippet defines the size of one symbol as the entire size of .dynsym section divided by 2. The normal size of ELF symbols would be: >>> print ( int ( lief . ELF . ELF32 . SIZES . SYM )) # For 32-bits 16 >>> print ( int ( lief . ELF . ELF64 . SIZES . SYM )) # For 64-bits 24 In the case of the 64-bits id binary, we set this size to 924. When opening id_test in IDA and forcing to use Segment for parsing and not Sections we get the following imports : Only one import is resolved and the others are hidden. Note that id_test is still executable: $ id_test uid=1000(romain) gid=1000(romain) ... By using readelf we can still retrieve the symbols and we have an error indicating that symbol size is corrupted. $ readelf -s id_test readelf: Error: Section 5 has invalid sh_entsize of 000000000000039c readelf: Error: (Using the expected size of 24 for the rest of this dump) Symbol table '.dynsym' contains 77 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND endgrent@GLIBC_2.2.5 (2) 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __uflow@GLIBC_2.2.5 (2) 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getenv@GLIBC_2.2.5 (2) 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free@GLIBC_2.2.5 (2) 5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND abort@GLIBC_2.2.5 (2) ... In LIEF the (dynamic) symbol table address is computed through the DT_SYMTAB from the PT_DYNAMIC segment. To compute the number of dynamic symbols LIEF uses three heuristics: Based on hash tables (Gnu Hash / Sysv Hash) Based on relocations Based on sections Malwares start to use this kind of corruption as we will see in the next part.

Rootnik Malware Rootnik is a malware targeting Android devices. It has been analyzed by Fortinet security researcher. A full analysis of the malware is available on the Fortinet blog. This part is focused on the ELF format analysis of one component: libshell . Actually there are two libraries libshella_2.10.3.1.so and libshellx_2.10.3.1.so . As they have the same purpose, we will use the x86 version. First if we look at the ELF sections of libshellx_2.10.3.1.so we can notice that the address, offset and size of some sections like .text , .init_array , .dynstr , .dynsym are set to 0. This kind of modification is used to disturb tools that rely on sections to parse some ELF structures (like objdump, readelf, IDA ...) $ readelf -S ./libshellx-2.10.3.1.so There are 21 section headers, starting at offset 0x2431c: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .dynsym DYNSYM 00000114 000114 000300 10 A 2 1 4 [ 2] .dynstr STRTAB 00000414 000414 0001e2 00 A 0 0 1 [ 3] .hash HASH 00000000 000000 000000 04 A 1 0 4 [ 4] .rel.dyn REL 00000000 000000 000000 08 A 1 0 4 [ 5] .rel.plt REL 00000000 000000 000000 08 AI 1 6 4 [ 6] .plt PROGBITS 00000000 000000 000000 04 AX 0 0 16 [ 7] .text PROGBITS 00000000 000000 000000 00 AX 0 0 16 [ 8] .code PROGBITS 00000000 000000 000000 00 AX 0 0 16 [ 9] .eh_frame PROGBITS 00000000 000000 000000 00 A 0 0 4 [10] .eh_frame_hdr PROGBITS 00000000 000000 000000 00 A 0 0 4 [11] .fini_array FINI_ARRAY 00000000 000000 000000 00 WA 0 0 4 [12] .init_array INIT_ARRAY 00000000 000000 000000 00 WA 0 0 4 [13] .dynamic DYNAMIC 0000ce50 00be50 0000f8 08 WA 2 0 4 [14] .got PROGBITS 00000000 000000 000000 00 WA 0 0 4 [15] .got.plt PROGBITS 00000000 000000 000000 00 WA 0 0 4 [16] .data PROGBITS 00000000 000000 000000 00 WA 0 0 16 [17] .bss NOBITS 0000d398 00c395 000000 00 WA 0 0 4 [18] .comment PROGBITS 00000000 00c395 000045 01 MS 0 0 1 [19] .note.gnu.gold-ve NOTE 00000000 00c3dc 00001c 00 0 0 4 [20] .shstrtab STRTAB 00000000 024268 0000b1 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), p (processor specific) If we open the given library in IDA we have no exports, no imports and no sections: Based on the segments and dynamic entries we can recover most of these information: .init_array address and size are available through the DT_INIT_ARRAY and DT_INIT_ARRAYSZ entries

address and size are available through the and entries .dynstr address and size are available through the DT_STRTAB and DT_STRSZ

address and size are available through the and .dynsym address is available through the DT_SYMTAB The script recover_shellx.py recovers the missing values, patch sections and rebuild a fixed library. Now if we open the new libshellx-2.10.3.1_FIXED.so we have access to imports / exports and some sections. The .init_array section contains 2 functions: tencent652524168491435794009

sub_60C0 The tencent652524168491435794009 function basically do a stack alignment and the sub_60C0 is one of the decryption routines . This function is obfuscated with graph flattening and looks like to O-LLVM graph flattening passe : Fortunately there are few "relevant blocks" and there are not obfuscated. The function sub_60C0 basically iterates over the program headers to find the encrypted one and decrypt it using a custom algorithm (based on shift, xor, etc).