Null Terminated Programming 101 - x64

Preface

Everyone, class is in session! please take your seat as we are about to start.

Today we’re going to dive deep into a magnificent assembly language - x64.

I chose this excellent language for this class because of its extreme use in personal computers and because it’s known by most researchers.

When speaking a language, we often use synonyms while speaking about a subject, this makes the conversation more intelligent and exciting for everyone involved.

Even now we just saw the use of them in the way I described x64 as magnifect and excellent. Two different words which mean the same thing.

When writing shellcode, we sometimes need to perform something similar, this is mainly because in some cases, the way we input our shellcode using our exploit will have a limitation of sort.

Some examples:

1.No null bytes in case of strcpy, which stops copying after it encounters a null byte

2.Size limit of on how much shellcode we can put

3.Only alphanumeric characters are copied

These conditions cause us to develop different methods to write our shellcode in a way that it will implement the same logic but still be able to bypass the limitations imposed by the program

Today we will attempt to solve case #1 by learning about techniques that we can use to tackle this issue,

these techniques are often called Null Terminated Programming, as they allow us to compile assembly code that will not contain any null bytes in the final shellcode.

Please note that some of the techniques described here will probably be relevant for different assembly languages as well so i’ll leave it as an exercise for the reader to check these techniques on a different language.

Recommended Prerequisites:

x64/x86 assembly knowledge

Basic knowledge on building shellcodes

Setup

For our setup, i’m using a script i wrote for compiling shellcode for x86/x64

which we can also use to compile our shellcode.

next, we can use this repository in order to debug and run our shellcode nicely

Quick Refreshment on x64 syscalls

In x64 linux systems, Each syscall has a special number that represents it, when we want to perform a call to a certain syscall we first need to store the correct syscall number in RAX, then we pass the arguments 1-6 to the syscall using the registries RDI, RSI, RDX, RCX, R8, R9 accordingly.

Finally, we use the instruction syscall which performs the syscall itself and stores the return value in RAX.

The full x64 syscall map table can be found here

Let’s have a look at this simple assembly code i wrote named printf_file.asm:

SYS_READ equ 0 SYS_WRITE equ 1 SYS_OPEN equ 2 SYS_EXIT equ 60 AMOUNT_TO_READ equ 16 global _start section .text _start: jmp get_file_path continue: ; syscall to open the file mov eax, SYS_OPEN pop rdi ; pop address of string to rdi mov rsi, 0 ; set O_RDONLY flag syscall ; syscall to read file sub sp, 0xff lea rsi, [rsp] ; syscall to write file contents to stdout mov rdi, rax ; use the returned fd mov rdx, AMOUNT_TO_READ; amount to read mov rax, SYS_READ syscall ; syscall write to stdout mov rdi,1 ; set stdout fd = 1 mov rdx, rax ; write to stdout the amount of bytes read mov rax, SYS_WRITE syscall mov rax, SYS_EXIT syscall ; finish execution ; jump here in order to get the address of the string get_file_path: call continue file_path: db "/tmp/my_file", 0

This code performs a simple task, it reads 16 bytes from the file located in

/tmp/my_file and outputs those bytes to stdout.

Notice the cool trick we implemented in order to obtain the string that contains the path to the file.

to get that address, we perform a call to a label near that string called get_file_path,

afterwards we immediately perform a second call to the continue label that brings us back to the rest of our shellcode.

Because the second call was invoked, the address to return to after the second call now points to the string, because that is the first “instruction” after the call instruction. we then pop that address to RDI so that RDI (the first param in x64 syscall conventions) can point to the string of the file we wish to open

You can compile it the above assembly code by running

path_to_make_shellcode_ /make_shellcode_linux/make_shellcode.sh/ ./printf_file.asm 64

and run it using

path_to_shellrun/shellrun ./print_file.bin

Let’s check if the shellcode works properly

echo “this_is_my_data” > /tmp/my_file

So far so good.

But under the surface, hids a horrible secret…

It’s full of null bytes!!!

Hexdump just showed us that this shellcode is riddled with null bytes,

Let’s begin our work at curing this code by going over the correct ways to bypass situations where instructions generate null bytes

I will show the opcodes of the instruction on the left side

and the instruction itself on the right side

Note:I’ll be using this website in order to show the bytes generated from the instructions were about to show. I recommend to you all to test your instruction combinations there.

Method 1: Math is awesome

I’ll start of by saying that the mov instruction is many times obsolete when you have the power of math at your side

Loading 0 to a register

Bad way:

Lets look at the following instruction:

48 c7 c0 00 00 00 00 mov rax, 0

it is 7 bytes long and more importantly, contains 4 null bytes!

we can easily use the following instructions instead

Good way:

48 31 c0 xor rax, rax

48 c7 c0 ff ff ff ff mov rax,0xffffffffffffffff 48 ff c0 inc rax

in case the value of rbx is 0, we can execute this instruction.

(this can also be done with any other register with a 0 value)

48 f7 e3 mul rbx

The mul instruction will multiply rax with the contents of rbx and store it in rax

because rbx is 0 in this case then 0 will also be stored in rax

**Loading large values to registries **

What about putting large values in registries? For example, if i wanted to read a big file with my shellcode.

Bad way

48 c7 c2 00 00 01 00 mov rdx,0x10000

Good way

You can use the shift operations in order to load large numbers

48 31 d2 xor rdx,rdx 48 83 c2 02 add rdx,0x2 48 c1 e2 0f shl rdx,0xf

This will result in rdx having the value 0x10000 at the end of the shift operation.

Method 2: Using your lower parts

Before you start thinking dirty, different parts of each register in x64 can be accessed as an operand.

These parts are mapped in the following way:

This allows us to use the al operand for example instead of the rax operand when we want to perform reading and writing actions on the lower 8 bits of the rax register.

When we do so, the instruction that is executed is much smaller and can also aid us when trying to avoid null bytes.

**Bad way:

48 c7 c0 02 00 00 00 mov rax,0x2 48 c7 c3 ff 0f 00 00 mov rbx,0xfff

Good way

48 31 db xor rbx,rbx 48 31 c0 xor rax,rax b0 02 mov al,0x2 66 bbff 0f mov bx,0xfff

Field Test

After we learned these two new methods, let’s implement and modify the assembly code we saw at the beginning of the article

SYS_READ equ 0 SYS_WRITE equ 1 SYS_OPEN equ 2 SYS_EXIT equ 60 AMOUNT_TO_READ equ 16 global _start section .text _start: jmp get_file_path continue: ; syscall to open the file xor rax, rax add al, SYS_OPEN pop rdi ; pop address of string to rdi xor rsi, rsi ; set O_RDONLY flag syscall ; syscall read file sub sp, 0xfff lea rsi, [rsp] mov rdi, rax xor rdx, rdx add dl, AMOUNT_TO_READ; amount to read xor rax, rax syscall ; syscall write to stdout xor rdi, rdi add dl, 1 ; set fd to point to stdout mov rdx, rax xor rax, rax add al, SYS_WRITE syscall mov al, SYS_EXIT syscall ; finish execution ; jump here in order to get the address of the string get_file_path: call continue flag: db "/tmp/my_file", 0

After we compile this code, we can run it and see that it works exactly the same as the previous code:

let’s see if hexdumps finds any null bytes…

Awesome!

Note: Don’t be confused by the one null byte that hexdump found, that null byte belongs to the string in our shellcode and it’s placed at the end of the shellcode.

It doesn’t seem like it is in the end because memory is saved in little endian.

Conclusions

Today we learned about how we can compile our shellcode to be free of null bytes. We learned along the way about different ways we can perform the same resulting actions using different and sometimes shorter instructions(opcode wise) in x64.

Finally, we used this knowledge to transform shellcode that was riddled with null bytes into one that is ready to tackle any strcpy in it’s path.

I hope you all enjoyed this article and learned more about the x64 instruction set along the way, there are many more methods and techniques yet to learn and I urge you all to keep learning what you don’t know and teach what you do know.

Spread the good word,

x24whoamix24

Sources

https://filippo.io/linux-syscall-table/

https://defuse.ca/online-x86-assembler.htm#disassembly