There isn’t much when it comes to egg hunters, and even less when it comes to x64 ones. And the ones out there on exploit-db and shell-storm do leave a lot to be explained, and some… let’s just say I can’t imagine the authors even bothered to test them, let alone read Skape’s paper on Safely Searching Process Virtual Address Space. Fortunately, Skape’s paper exists and, together with a lot of reading … and some more reading, on memory models, virtual address space, etc, I hope to be able to explain this in the simplest of ways.

So, what’s an egg hunter? It’s basically a very short shellcode which has a single purpose: locate a longer shellcode somewhere else in memory and execute it. There are instances of buffer overflow exploitation, where you’ll be extremely limited in terms of buffer space. In these specific cases, you may need an egg hunter if you can also put in the longer shellcode elsewhere to be found and executed.

Searching the memory

There are a lot of details in terms of memory implementation by the Operating System and the CPU architecture. That is beyond scope, and I’ll be addressing only the parts that will impact the code.

From the egg hunting perspective, the important thing to know is how the memory is presented to the process. And that’s through what’s called Virtual Address Space (VAS). By default, the 64-bit VAS has the following structure:

If you want to go into the details of why is it organised as such, how is the mapping done to real addresses, paging, etc, look into section 3.3 of the Intel’s manual. Suffice to say that, for performance purposes, current (and foreseeable future) address space implementations only use 48 bits (realistically more than enough in our times) out of the 64. The 48th bit (position 47) is extended to the rest of the left bits, hence creating a range of unused positions, the so-called “canonical hole” in figure 1.

Now this is great news actually, because we now realise we don’t have to look from position zero (first byte in memory – VAS) to position 0xffffffffffffffff (64-bit all 1’s). We just have to search for the egg inside the user space. Now the questions are: do I have to look for it in all user space? Is there a faster way to do it?

The user space (green section in figure 1) memory structure goes as follows:

0x00007fffffffffff

User stack

|

v

Memory mapped region for shared libraries or anything else

^

|

Heap

Uninitialised data (.bss)

Initialised data (.data)

Program text (.text)

0x0000000000000000

It’s made out of regions/sections with different access permissions. Let’s look into this in an example.

I wrote the following code:

After compiling (# nasm -f elf64 test.nasm -o test.o && ld -o test test.o) and executing it, it’ll hang on the read syscall, keeping it running while waiting for me to attach to it with GDB.

To attach to it, I need the PID, which I’m getting at figure 3, and I’m also executing “cat proc/<PID>/maps” so I can see the sections. Now, there are other ways (“pmap”, and “info proc mappings” inside GDB) with which I can get the sections listing, but only the “cat proc/…” method gives me the permissions (read, write, execute, private, shared) the process has on the sections, which I want to show you for the purpose of this exercise.

This is a small assembly application (no external libraries), so it has few sections, and we won’t find any non-readable ones. I could integrate a glibc call (ex: printf, scanf, exit) into the application and compile it with gcc instead of ld, to show a non-readable section (.so files) example, but the exception I want to show, is also triggered when trying to write to a section you don’t have write permissions to, so for the sake of simplicity, I’ll stick to this one.

We now attach to the PID using GDB, set RSP with a value belonging to the range of section VDSO, as seen in figure 3, and try to write into it.

So figure 4’s write attempt results in a segmentation fault. This happens whenever the process tries to access (read/write/execute) positions in memory it does not have the corresponding permission to.

And that is why you cannot simply iterate through the whole user space. This interrupt signal (SIGSEGV) would break your egg hunter and render it useless.

Note that there are a few egg hunters that do exactly that, but I’ll go through this choice and why it (sometimes) works, at the end of this post.

One last detail about memory: pages. Every section/region is made out of multiple pages, which are continuous fixed-length blocks of memory. Memory is allocated in chunks of these page units, and this is important for us because if you can’t access a memory address in a page, there’s no point on continuing testing the following positions in the same page, since they’ll either be not allocated, or you’ll have the same permission issues because all permissions are the same in a page. This will make the search algorithm faster.

If you look at figure 3, you’ll realise that the start and end addresses (in hex) are all multiples of 4096, as the lower 12 bits are always zeroed. For example, even though the application’s code is less then 4096 bytes, the section it’s in, still is exactly 4096 bytes in size (from 0x00400000 to 0x00401000).

But how can I be sure the page size is actually 4096?

The egg hunter, first attempt

Usually, at this point, I’d go for the obvious solution to the SIGSEGV problem: setting up an interrupt handler because that’s what SIGSEGV is: an interrupt signal. But it’s pretty obvious it would result in a long (relatively speaking) code, which ruins the whole purpose of the egg hunter. Besides, Skape tried it and the same conclusion was reached.

So, we’ll definitely have to search through memory, but we still have the SIGSEGV issue to deal with. To address it, Skape shows us three solutions, by basically using syscalls that will return a clear indication (EFAULT = -14 = 0xfffffffffffffff2) that we do not have permissions to access the given memory position.

I then thought about trying a syscall other than the ones proposed by Skape (__NR_access and __NR_rt_sigaction). And the write (__NR_write) syscall went through my mind. It’s used to print text into the screen and for it to know what we want to print, it requires a buffer as the 2nd parameter (RSI). So I wrote the following test code:

And when testing it, for address position 0x1000, it returned the EFAULT (0xfffffffffffffff2) as I was hoping for.

This made me hopeful… until I actually wrote the egg hunter code, and while it did work when I positioned the first address at the section where the real payload is located, it didn’t work so well in other more generic scenarios. Specifically, it didn’t return EFAULT when it was supposed to and then “broke” (SIGSEGV) when trying to fetch the four bytes to compare with the egg. At some point, I was way into debugging this and realised that showing a new way to do egg hunting, while it’d be cool, wasn’t exactly the point of this post, so I took a step back…

Egg hunter, final attempt/code – 34 bytes

I decided to use the access syscall, shown by Skape, because it’s the most robust, as is pointed out in Skape’s paper.

I also decided to use a 4 byte egg: 0xbeefbeef. The actual size of it mostly depends on you wishing to reduce the odds of finding this sequence along the memory. The bigger the egg, the lesser the chance for it to already be there in the VAS. So I figured, if four bytes was good enough for uniqueness in 32-bit systems, it’s good enough here too.

But, because I’m not duplicating its size, there’s the (very high) risk of the hunter actually detecting itself, because the egg is present in it’s own code. That’s why I set the EAX (lowest 32 bits of RAX) register with a different value, and make it right with the increment, right next (Figure 8 – lines 19, 20).

I won’t be explaining the details of each instruction, as I already did here and here.

At first (line 4), I zero out RSI which is the second parameter to the access syscall (F_OK=0). That same zero will be put in RDI which will contain the memory address to check if we have permissions to read from. Note that the actual first addressable memory position is not 0x00. But it’s small and close enough for us to ignore the latency in reaching it because, otherwise, we’d have to add some bytes to the code to add that logic.

The next_page label contains the code to increment the address to the next multiple of 4096, which is the next page in memory.

Then, in the next_4_bytes label, we basically go through calling access syscall, validating accessibility to the memory address in RDI, and if accessible, fetch the 4 bytes in it and compare it with our egg.

After compiling it:

# nasm -felf64 egghunter.nasm -o egghunter.o && ld egghunter.o -o egghunter

extracting the hex code:

# for i in `objdump -d egghunter | tr ‘\t’ ‘ ‘ | tr ‘ ‘ ‘

’ | egrep ‘^[0-9a-f]{2}$’ ` ; do echo -n “\x$i” ; done

\x48\x31\xf6\x56\x5f\x66\x81\xcf\xff\x0f\x48\xff\xc7\x6a\x15\x58\x0f\x05\x3c\xf2\x74\xef\xb8\xbd\xef\xbe\xef\xfe\xc0\xaf\x75\xed\xff\xe7

adding it to the shellcode.c “test environment” (with a simple execve payload):

#include <stdio.h>

#include <string.h> #define EGG “\xBE\xEF\xBE\xEF” unsigned char hunter[] = \

“\x48\x31\xf6\x56\x5f\x66\x81\xcf\xff\x0f\x48\xff\xc7\x6a\x15\x58\x0f\x05\x3c\xf2\x74\xef\xb8\xbd\xef\xbe\xef\xfe\xc0\xaf\x75\xed\xff\xe7”; unsigned char payload[] = \

EGG

“\x6a\x3b\x58\x99\x52\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x53\x54\x5f\x52\x54\x5a\x57\x54\x5e\x0f\x05”; int main(void) {

printf(“Egg hunter’s size (bytes): %lu

”, strlen(hunter));

printf(“Payload’s size (bytes): %lu

”, strlen(payload));

int (*ret)() = (int(*)())hunter;

ret();

}

and compiling it:

# gcc -fno-stack-protector -z execstack shellcode.c -o shellcode

It runs successfully:

Notice how weird it is that the actual payload is smaller than the egg hunter. Well, that’s just this example because I’m using execve as a payload. But this particular payload, while being great for testing, it wouldn’t do much for me if I’m attacking a remote system. So, in a real scenario, we’d be using something like a bind shell or a reverse tcp shell.

Now this all looks good, right? Well it’s not. In order for this screenshot to be possible, I had to manipulate the application in the background:

I’m basically attaching GDB to the running process (attach 7660), checking on the position of the first code section (the second and third are .data and .bss sections) with the info proc mappings command, setting the RDI register with that value (set $RDI = …), and telling the hunter to keep hunting with the continue (c) command.

Why would I do that? You’d reasonably ask.

Because it takes sooooooo long to run through the user space in VAS in 64-bit architecture, even with all the cuts we’ve made (I left my Intel Core i7 laptop running throughout the night and nothing). This method would easily run through a 32-bit VAS, but definitely not a great option for 64-bit.

And because of memory randomisation (ASLR protection), the first section of the code doesn’t have a predictable position to start.

I ran the shellcode application a few times, and these are some of the first code section addresses it had (proving the point of figure 11):

And this is why we see pretty much all x64 egg hunters, beginning their search at memory address on RDX. Because, in the typical test code for shellcodes (shellcode.c), RDX is the register with the memory position to the egg hunter’s code in memory. Starting from there saves the hunter hours (days/weeks?) of searching time:

The advantage of using RDX as a starting point, is that, not only it’s WAY faster (pretty much instantaneous actually), it also gives you the ability to make the egg hunter’s code much smaller. This is because in this part of memory you’re sure to have read permissions, which voids the importance of the access syscall to prevent SIGSEGV interrupts.

And that’s why some of the codes out there are much smaller than the one showed here. Because they make assumptions that will sacrifice robustness and the ability to search everywhere, to favour speed (in only the very specific case of the typical C code used for testing shellcodes) and favour code shortening. Why would that sacrifice robustness? Because we have no idea of the meaning of the RDX’s value in the beginning of your hunter’s execution when actually executed in “real” applications, and the (very high) odds are that at some point it’ll run into unallocated memory positions, or pages you don’t have permissions to read from.

Bottom line: you’ll have to count on some persistence and a lot of luck for this to work in a real x64 system scenario.

You can find all the files on my gitlab account.

Thanks to Vivek Ramachandran and the Pentester Academy team, as I have enjoyed every second of this course since I’ve learned so many interesting things. Appreciate your work!

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/x8664-assembly-and-shellcoding-on-linux/index.html

Student ID: PA-2109