Introduction

String/Pattern Matching algorithms are by far the most popular and easy way to detect a shellcode. The principle is simple: all codes have unique characteristics that can be used as signatures to identify in memory. Even shellcodes with no prior analysis can contain some known value, or piece of code that at least makes it appear suspicious, and worthy of closer inspection. The known values I’m referring to are of course API strings and hashes that have always been useful in detecting malicious code for 20+ years. If we’re going to develop more advanced shellcodes that go undetected by modern AV solutions, we need to consider using a HLL like C or C++ along with unconventional programming that guarantees everything in a shellcode is permutable. It’s not just the executable code that should be permutable, but the entire thing; code, data, constants, strings..everything. What I show here today is how to create permutable API hashes using a hash function called Maru (Ma-roo), named after the Japanese cat. Maru uses a block cipher to generate a hash of string. You don’t have to use a block cipher for this, but a cryptographic primitive can help protect the hashes against collision attacks found using Satisfiability Modulo Theories (SMT). For more information, read Using SAT and SMT to defeat simple hashing algorithms.Readers may also wish to consider reading the section on cryptography in Quick introduction into SAT/SMT solvers and symbolic execution

CryptoSMT

**WARNING: Maru is not a cryptographic hash, and should not be used for applications where a secure hash algorithm is required.**

Signature detection

When viruses for MS-DOS started to appear 30 years ago, detection by signature was relatively simple. In an attempt to hide code, authors used a simple XOR operation with an 8-bit value. For example, the ‘Skism Rythem Stack Virus-808‘ from 1991 used milliseconds of the current time to “encrypt” each new infection, so it would appear differently for 100 versions. This is the entry point of that code, and its “decryption” process.

jmp virus_start encrypt_val db 00h virus_start: call encrypt ; encrypt/decrypt file jmp virus ; go to start of code encrypt: push cx mov bx , offset virus_code ; start encryption at data xor_loop: mov ch , [ bx ] ; read current byte xor ch , encrypt_val ; get encryption key mov [ bx ] , ch ; switch bytes inc bx ; move bx up a byte cmp bx , offset virus_code + virus_size ; are we done with the encryption jle xor_loop ; no? keep going pop cx ret

Later in the code, the virus will update ‘encrypt_val’ using the milliseconds value of the current time.

; mov ah , 2ch ; get time int 21h mov encrypt_val , dl ; save m_seconds to encrypt val so ; theres 100 mutations possible

There’s some effort to hide the body of the virus here, but the decrypter is still visible and easily detected. Moreover, a scanner can emulate the decrypter routine, exposing the body of virus in memory before performing detection by signature. When Win32 viruses started to appear in 1997, there were no longer MS-DOS interrupts. Instead there were API functions, accessible through dynamic libraries or DLL, and this made detecting malware even easier. Any code with a large block of API strings in it was immediately going to raise alarm bells. Authors eventually started to use 32-bit checksums, the most popular of which was CRC32. Later, LSD-PL inspired a trend of using some basic Add-Rotate-Xor routine, like the following that appeared in their WASM package.

i3: mov esi , [ 4 * eax + ebx ] ; ptr to symbol name add esi , edx ; rva2va push ebx ; hash: h=((h<<5)|(h>>27))+c push eax xor ebx , ebx i4: xor eax , eax ; hash loop lodsb rol ebx , 5 add ebx , eax cmp eax , 0 jnz i4 ror ebx , 5 cmp ebx , ecx ; hash compare pop eax pop ebx je i5 ; same: symbol found inc eax jmp i3 ; different: go to the next symbol

Most legitimate code doesn’t have to resolve an API by string or hash manually through Import Address Table (IAT) or Export Address Table (EAT) and if some code does behave this way, it’s usually up to no good. What’s funny about API hashing; It was supposed to prevent signature detection of strings. So, instead of detecting by string, AV just detected the hash of an API string, that still makes it a perfectly valid signature. Even the shellcodes generated by many popular payload tools (BeEF, Meterpreter, Veil) all use basic hashing algorithms with no entropy or randomization. They are consistently the same, thus are just as easy to detect as any API string itself. Of course, these packages offer the ability to encrypt payloads with a stream or block cipher like RC4 or AES. Again though, even if it will pass a simple scan by AV or IDS, a more thorough analysis using an emulator can decrypt the main code and perform pattern matching afterwards, having no problem with detection.

Plain strings problem

Ideally, an attacker would want everything obfuscated/encrypted during compilation and decrypted at run-time without using additional tools or code. constexpr is ideal for C++ code, but for C, there’s no such feature unless of course, a compiler can be modified to encrypt all strings at compile time? Some assemblers such as MASM and NASM support macro based compilation, but are cumbersome to work with on complex functions. (not that you couldn’t) Z0MBiE already discussed the problem of plain strings in Data Encoding In Meta Viruses and Solving Plain Strings Problem In HLL around 2002 that did somewhat “solve” the problem, but if you look closely at the sources, there’s insufficient entropy used. At the time, he provided a library VirStr with TASM macros that worked with Borland’s C compiler, but these do not work with mingw, msvc or clang and the library has never been updated since. The purpose of using entropy for hashing of API strings is obviously to increase the difficulty of detecting a malicious payload. It’s not enough to avoid detection forever, but it’s certainly better than existing hash algorithms with no entropy at all.

Hash function constructions

We can construct new hash algorithms from block ciphers, and these offer us the potential to use unique keys, thus generating completely different API hashes every time the key changes. However, as you will see, very few block ciphers are suitable for lightweight applications such as shellcode. Here are 3 potential constructions to use. (there are more, but only 3 are covered)

1. Davies–Meyer

Feeds each block of the message ( ) as the key to a block cipher. Feeds the previous hash value ( ) as the plaintext to be encrypted. The output ciphertext is then also XORed with the previous hash value ( ) to produce the next hash value ( ). In the first round when there is no previous hash value it uses a constant pre-specified initial value ( ).

2. Matyas–Meyer–Oseas

Feeds each block of the message ( ) as the plaintext to be encrypted. The output ciphertext is then also XORed with the same message block ( ) to produce the next hash value ( ). The previous hash value ( ) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value ( ).

3. Miyaguchi–Preneel

Feeds each block of the message ( ) as the plaintext to be encrypted. The output ciphertext is then XORed with the same message block ( ) and then also XORed with the previous hash value ( ) to produce the next hash value ( ). The previous hash value ( ) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value ( ).

Choosing a block cipher

We know signature detections require some static value, and what better static values exist than cryptographic constants? You can of course change them, but then it might affect security of the cipher. So, it’s better to choose a cipher that doesn’t use any constants, at all. While looking at potential block ciphers, my top 3 were based on Add-Rotate-Xor (ARX) designs.

1. SPECK

Designed by the NSA and published in 2013. Supports a wide range of key and block sizes for different architectures. Proven to be the smallest software based block cipher available for most architectures. Parameters for x86: 64-bit block, 128-bit key. For x86-64: 128-bit block, 256-bit key. Both implemented in 64 and 86 bytes respectively. Here’s the x86 code.

2. CHASKEY

Designed and published in 2015. Uses an Even-Mansour design with permutation function derived from SipHash. 128-bit keys, 128-bit blocks. Can be implemented in approx. 90 bytes using x86 assembly.

3. XTEA

eXtended Tiny Encryption Algorithm. A Feistel cipher, published in 1997 after weaknesses found in TEA. 128-bit keys, and 64-bit block. Has a known constant, 0x9E3779B9 that makes it susceptible to signature based detection. Can be implemented in approx. 80 bytes.

SPECK is just the better design for lightweight applications and can’t be easily identified by signature. (at least not from any constants)

Padding

The padding is similar that used in MD4 and MD5, but instead of the string length stored in bits as 64-bit integer, only the string length as a 32-bit integer is stored. Since it will never exceed 256-bits for Maru 1 or 512 bits for Maru 2, there didn’t seem any point using a 64-bit integer.

SPECK Parameters

Version 1 is suitable for 32-bit architectures while version 2 is suitable for 64-bit.

Ver. Cipher Block Size Key Size 1 SPECK 64 128 2 SPECK 128 256

32-bit Architectures

The following is implementation of SPECK for Ver. 1

// SPECK-64/128 uint64_t speck ( void * mk , uint64_t p ) { uint32_t k [ 4 ] , i , t ; union { uint32_t w [ 2 ] ; uint64_t q ; } x ; // copy plaintext to local buffer x . q = p ; // copy master key to local buffer for ( i = 0 ; i < 4 ; i + + ) k [ i ] = ( ( uint32_t * ) mk ) [ i ] ; for ( i = 0 ; i < 27 ; i + + ) { // encrypt plaintext x . w [ 0 ] = ( ROTR32 ( x . w [ 0 ] , 8 ) + x . w [ 1 ] ) ^ k [ 0 ] , x . w [ 1 ] = ROTR32 ( x . w [ 1 ] , 29 ) ^ x . w [ 0 ] , t = k [ 3 ] , // create next subkey k [ 3 ] = ( ROTR32 ( k [ 1 ] , 8 ) + k [ 0 ] ) ^ i , k [ 0 ] = ROTR32 ( k [ 0 ] , 29 ) ^ k [ 3 ] , k [ 1 ] = k [ 2 ] , k [ 2 ] = t ; } // return 64-bit ciphertext return x . q ; }

64-bit architectures

The following is implementation for Ver.2

void speck ( void * in , void * mk , void * out ) { uint64_t i , t , k [ 4 ] , * r = ( uint64_t * ) out , * h = ( uint64_t * ) in ; // copy 128-bit plaintext to local buffer r [ 0 ] = h [ 0 ] ; r [ 1 ] = h [ 1 ] ; // copy 256-bit master key to local buffer for ( i = 0 ; i < 4 ; i + + ) k [ i ] = ( ( uint64_t * ) mk ) [ i ] ; for ( i = 0 ; i < 34 ; i + + ) { // encrypt plaintext r [ 1 ] = ( ROTR64 ( r [ 1 ] , 8 ) + r [ 0 ] ) ^ k [ 0 ] , r = ROTR64 ( r [ 0 ] , 61 ) ^ r [ 1 ] , t = k [ 3 ] , // create next subkey k [ 3 ] = ( ROTR64 ( k [ 1 ] , 8 ) + k [ 0 ] ) ^ i , k [ 0 ] = ROTR64 ( k [ 0 ] , 61 ) ^ k [ 3 ] , k [ 1 ] = k [ 2 ] , k [ 2 ] = t ; }

Maru Parameters

The numbers below (excluding Ver.) represent number of bits.

Ver. CPU Construction Padding Key Message Max. Input Output 1 32 Davies-Meyer Merkle-Damgard 64 128 256 64 2 64 Davies-Meyer Merkle-Damgard 128 256 512 128

There are 2 64-bit constants used to initialize that are not really of any significance. I just wanted something “random” without using a random number generator. Here is how to generate them using SpeedCrunch.

The 2nd value shown is used to initialize in both versions. The n-bit seed is xor’d against these values to make the hash output different. I’ve emphasized the importance of eliminating constants that are not really permutable, and can be used as signatures, so why use them here? You don’t have to use them. I’m only using here for illustration, but you can generate your own values to initialize or just use zero instead.

Maru 1

The only real difference between the two would be cipher parameters and local capacity for keys. We use a 32-bit seed here to initialize .

uint64_t maru ( const char * api , uint64_t iv ) { uint64_t h ; uint32_t len , idx , end , i ; union { uint8_t b [ MARU_BLK_LEN ] ; uint32_t w [ MARU_BLK_LEN / 4 ] ; } m ; // set H to initial value h = iv ; for ( idx = 0 , len = 0 , end = 0 ; ! end ; ) { // end of string or max len? if ( api [ len ] = = 0 | | len = = MARU_MAX_STR ) { // zero remainder of M for ( i = idx ; i < MARU_BLK_LEN ; i + + ) m . b [ i ] = 0 ; // store the end bit m . b [ idx ] = 0x80 ; // have we space in M for api length? if ( idx > = MARU_BLK_LEN - 4 ) { // no, update H with E h ^ = MARU_CRYPT ( & m , h ) ; // zero M for ( i = 0 ; i < MARU_BLK_LEN ; i + + ) m . b [ i ] = 0 ; } // store total length in bits m . w [ ( MARU_BLK_LEN / 4 ) - 1 ] = ( len * 8 ) ; idx = MARU_BLK_LEN ; end + + ; } else { // store character from api string m . b [ idx ] = ( uint8_t ) api [ len ] ; idx + + ; len + + ; } if ( idx = = MARU_BLK_LEN ) { // update H with E h ^ = MARU_CRYPT ( & m , h ) ; // reset idx idx = 0 ; } } return h ; }

Maru 2

Here, we use a 64-bit seed. The 128-bit result is stored in out parameter.

void maru2 ( const char * key , uint64_t iv , void * out ) { union { uint64_t q [ 2 ] ; uint32_t w [ 4 ] ; uint8_t b [ 16 ] ; } c , h ; union { uint64_t q [ 4 ] ; uint32_t w [ 8 ] ; uint8_t b [ 32 ] ; } m ; int len , idx , i , end ; // initialize H with iv h . q [ 0 ] = MARU2_INIT_B ^ iv ; h . q [ 1 ] = MARU2_INIT_D ^ iv ; for ( idx = 0 , len = 0 , end = 0 ; ! end ; ) { // end of string or max len? if ( key [ len ] = = 0 | | len = = MARU2_MAX_STR ) { // zero remainder of M for ( i = idx ; i < MARU2_BLK_LEN ; i + + ) m . b [ i ] = 0 ; // add end bit m . b [ idx ] = 0x80 ; // have we space in M for len? if ( idx > = MARU2_BLK_LEN - 4 ) { // no, encrypt H MARU2_CRYPT ( & h , & m , & c ) ; // update H h . q [ 0 ] ^ = c . q [ 0 ] ; h . q [ 1 ] ^ = c . q [ 1 ] ; // zero M for ( i = 0 ; i < MARU2_BLK_LEN ; i + + ) m . b [ i ] = 0 ; } // add total len in bits m . w [ ( MARU2_BLK_LEN / 4 ) - 1 ] = ( len * 8 ) ; idx = MARU2_BLK_LEN ; end + + ; } else { // add byte to M m . b [ idx ] = ( uint8_t ) key [ len ] ; idx + + ; len + + ; } if ( idx = = MARU2_BLK_LEN ) { // encrypt H MARU2_CRYPT ( & h , & m , & c ) ; // update H h . q [ 0 ] ^ = c . q [ 0 ] ; h . q [ 1 ] ^ = c . q [ 1 ] ; idx = 0 ; } } for ( i = 0 ; i < MARU2_HASH_LEN ; i + + ) ( ( uint8_t * ) out ) [ i ] = h . b [ i ] ; }

Testing

Both versions performed well when tested with Robert G. Browns dieharder tool. The tests involved setting a string buffer to all 1s and incrementing sequentially to generate hashes, similar to how one might test a PRNG.

Maru 1

Maru 2

Of course, just because the hash output passed dieharder tests, that doesn’t necessarily mean we have a good hash function, but it’s at the very least a good sign.

Example Hashes

Maru 1

Maru 2

As you can see above, once the seed value is changed, the hash output also changes.

Summary

We can construct better hashing algorithms for API strings using block ciphers. Maru isn’t intended for digital signatures or mission critical applications that requires robust collision resistant hashing functions, but it is good enough for hashing short strings like API. Maru is really an idea about how to introduce entropy into the string hashing process, while trying to increase difficulty of string collisions. You can see full source for Maru hash here. About Maru the cat