What? you can do that in Linux? It turns out you can!

First, let’s see it in action. Here I retrieve a binary from my Raspberry Pi which is an ARM binary and execute it in my x86_64 machine transparently.

If you try to do this… it won’t work right away.

$ ./echo zsh: exec format error: ./echo

First we have a couple things to set up. We will be using QEMU in a slightly unconventional way in a combination with a kernel feature called binfmt_misc.

QEMU user mode

Obviously our CPU is not able to run foreign machine code instructions. We said we would be using QEMU, but in a slightly unconventional way.

We all know QEMU as a virtual machine, where we load a virtual (fake) hard drive with an operating system and we setup fake hardware to interface with it: a fake CPU, fake keyboard, fake network adapter and so on. This look like this

But there is also another mode of use in QEMU, called user emulation.

When we write a program, we interact with the system through system calls. We need to do this in order to interact with the keyboard, terminal, screen, filesystem and so on. This means that when we execute a program, the code that we write is executed in user space, and then the kernel does the interacting with the system part for us. We just request things from the kernel such as writing to a file.

In QEMU system emulation this looks like this

In user mode, QEMU doesn’t emulate all the hardware, only the CPU. It executes foreign code in the emulated CPU, and then it captures the syscalls and forwards them to the host kernel. This way, we are interfacing the native kernel in the same way as any native piece of software. This looks like this

This has many benefits, because we are not emulating all the hardware, which is slow, and also we are not emulating the kernel which is a decent part of the computation that takes place. Actually we don’t even need a kernel. We can understand now why this runs much faster than full system emulation.

As an example, let’s crosscompile a static ARM binary

#include <stdio.h> int main(int argc, char** argv) { printf("hello world

"); return 0; }

we need to install the toolchain to crosscompile from x86 to armhf, for instance

# apt-get install gcc-arm-linux-gnueabihf

, or in Arch Linux

$ pacaur -S aur/arm-linux-gnueabihf-gcc

Then we generate the binary

$ arm-linux-gnueabihf-gcc hello.c -o hello_arm_static -static $ file hello_arm_static hello_arm_static: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=69ff53a55d64975f87b9ea3543d26bcbae31de9f, with debug_info, not stripped

Now we can run it with qemu-arm. We need to install the package qemu-user

# apt-get install qemu-user

, and now we can run

$ qemu-arm hello_arm_static hello world

This isn’t yet very useful because most programs are dynamically linked. We still have some work to do.

Running ARM executables transparently

Recall from the last post on Linux executables what happens when we execute a file and how we can use binfmt_misc to set up our own interpreters. Now we have all the pieces and we want to put them together. We need to setup binfmt_misc in order to use QEMU user mode as an interpreter for our binary format.

We can do it ourselves manually, or install the qemu-user-binfmt package, normally installed automatically with qemu-user. We end up with the binfmt_misc entries

ls -l /proc/sys/fs/binfmt_misc total 0 -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-aarch64 -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-alpha -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-arm -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-armeb -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-cris -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-m68k -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-microblaze -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-mips -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-mipsel -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-ppc -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-ppc64 -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-ppc64abi32 -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-s390x -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-sh4 -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-sh4eb -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-sparc -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-sparc32plus -rw-r--r-- 1 root root 0 Jun 7 11:36 qemu-sparc64 --w------- 1 root root 0 Jun 7 11:36 register -rw-r--r-- 1 root root 0 Jun 7 11:36 status

Now we can substitute

$ qemu_arm hello_arm_static hello world

for

$ ./hello_arm_static hello world

, because we have an active entry in binfmt_misc

$ cat /proc/sys/fs/binfmt_misc/qemu-arm enabled interpreter /usr/bin/qemu-arm-static flags: OC offset 0 magic 7f454c4601010100000000000000000002002800 mask ffffffffffffff00fffffffffffffffffeffffff

The kernel recognizes the ARM ELF magic, and uses the interpreter /usr/bin/qemu-arm-static , which is the correct QEMU binary for the architecture. 0x7F ‘ELF’ in hexadecimal is 7f 45 4c 46, so we can see how the magic and the mask work together, considering the structure of the ELF header

typedef struct { unsigned char e_ident[EI_NIDENT]; /* 0x7F 'ELF' four byte ELF magic for any architecture */ uint16_t e_type; uint16_t e_machine; /* architecture code, 40=0x28 in the case of ARM */ uint32_t e_version; ElfN_Addr e_entry; ElfN_Off e_phoff; ElfN_Off e_shoff; uint32_t e_flags; uint16_t e_ehsize; uint16_t e_phentsize; uint16_t e_phnum; uint16_t e_shentsize; uint16_t e_shnum; uint16_t e_shstrndx; } ElfN_Ehdr;

At the end of the day, we want our code to tell the kernel to print hello world. Let’s compare the kernel interactions of the real

$ strace ./hello_static 2>&1 | grep -e execve -e readlink -e write execve("./hello_static", ["./hello_static"], 0x7ffd3d83b2b0 /* 41 vars */) = 0 readlink("/proc/self/exe", "/home/nacho/srctest/hello_static", 4096) = 32 write(1, "hello world

", 12hello world

and the emulated code

$ strace ./hello_arm_static 2>&1 | grep -e execve -e readlink -e write execve("./hello_arm_static", ["./hello_arm_static"], 0x7ffd4b19b5c0 /* 41 vars */) = 0 readlink("/proc/self/exe", "/usr/bin/qemu-arm-static", 4096) = 24 write(1, "hello world

", 12hello world

The execve() syscall is the same, and the write() call too so we get the same behaviour. We can also see that a read to /proc/self/exe reveals that the binary being run natively is in fact qemu-arm-static, the interpreter.



Again, most of the work is being done natively by the kernel, so this actually runs much faster than in QEMU full emulation because the part of the kernel execution would need to be emulated too, as well as the virtual hardware. It is also much easier to setup.



This is still not that useful yet, because very few programs are statically linked. Let’s create x86 and amrhf versions of hello.c

$ gcc hello.c -o hello $ gcc hello.c -o hello_static -static $ arm-linux-gnueabihf-gcc hello.c -o hello_arm $ arm-linux-gnueabihf-gcc hello.c -o hello_arm_static -static

ARM binaries take much more space, because being a RISC architecture it has a smaller instruction set and so it needs more machine code to perform many common operations. Code density can be improved by using the THUMB instruction set.

$ dutree [ crosshello 4.64 MiB ] ├─ hello_arm_static │ ███████████████████████████████████████████████│ 84% 3.91 MiB ├─ hello_static │ ██████│ 15% 724.89 KiB ├─ hello_arm │ │ 0% 15.62 KiB ├─ hello │ │ 0% 8.16 KiB └─ hello.c │ │ 0% 97 B

Let’s try this

$ ./hello_arm /lib/ld-linux-armhf.so.3: No such file or directory

Dynamically linked executables provide the path of the runtime linker ( a.k.a ELF interpreter ) hardcoded at compile time.

$ file hello_arm hello_arm: ELF 32-bit LSB pie executable ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=2be332452ae4987fa763b6e75c359e08793572aa, with debug_info, not stripped $ file hello hello: ELF 64-bit LSB pie executable x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d11d6a23094a98009919746b13a1c064450aa944, not stripped

So the code fails because it cannot find the linker that it requires /lib/ld-linux-armhf.so.3. This normally comes with the cross-toolchain.

We could be tempted to do something really dirty like

# ln -s /usr/arm-linux-gnueabihf/lib/ld-linux-armhf.so.3 /lib/ld-linux-armhf.so.3

We would need to do this not only for ld-linux-armhf.so, but also for libc.so and everything else our binary might need, and we don’t want to have a mix of libraries of different architectures in the same place, right?

We can tell QEMU where to look for the linker and libraries with

$ qemu-arm -L /usr/arm-linux-gnueabihf hello_arm hello world

but we want transparent execution, so we can add this to .bashrc or .zshrc

export QEMU_LD_PREFIX=/usr/arm-linux-gnueabihf

, or configure it system wide at /etc/qemu-binfmt.conf

EXTRA_OPTS="-L /usr/arm-linux-gnueabihf"

Now it works transparently!

$ ./hello_arm hello world

This is still not that useful. The reason is that we now need to have a copy of all the ARM libraries required by our ARM binaries.

Our example works because everything hello.c needs is so basic that comes with the toolchain.

$ ldd hello linux-vdso.so.1 (0x00007ffd71ab5000) libc.so.6 => /usr/lib/libc.so.6 (0x00007f536fe48000) /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f5370406000)

The situation is not too bad in Debian, where you can install libraries from other architectures, for instance

# apt-get install libstdc++6:armhf

Emulating full ARM rootfs

Most often in real situations we need to work in the final system where the binary is supposed to run. It makes more sense to have the whole ARM environment with its ARM libraries and all. Enter chroot.

chroot, for change root is a system call and corresponding command wrapper that changes the root directory location of a process and its children. Given a directory with a different root filesystem, we can execute anything in it so that their view of the filesystem has been moved to the new root directory. For this reason it is often called a chroot jail. This is the predecessor of filesystem namespaces, a key component that makes containers possible.

As an example, let’s execute echo inside an x86 jail. I have prepared a whole Debian filesystem in new_root_folder .

# chroot new_root_folder /bin/echo "hello world" hello world

This echo, or whatever binary we run does not see anything outside of the jail. It is impossible for instance to remove or read a file outside of the new root folder.

We can get an existing ARM rootfs to work with, or we can generate one. In Debian we can use debootstrap with the –arch switch to generate a Stretch ARM rootfs.

$ debootstrap --arch=armhf stretch new_root_folder

What we want to do now is to use chroot to make the binaries inside the jail view the filesystem just like they expect it. By using chroot we already have /etc, /bin and all the regular folders in place. Next, we need to add the virtual filesystems

# mount -t proc proc new_root_folder/proc/ # mount -t sysfs sys new_root_folder/sys/ # mount -o bind /dev new_root_folder/dev/ # mount -o bind /dev/pts new_root_folder/dev/pts

Finally, we will copy the qemu-user-static binary inside the ARM filesystem.

# cp /usr/bin/qemu-arm-static new_root_folder/usr/bin

This little intruder will be the only x86 binary in an ARM filesystem, he’s surrounded!

We have everything in place! What will happen when we try to execute some ARM executable from the jail?

The chroot command will call execve() on the ARM binary

The ARM binary will be handled by the binfmt_misc binary handler, according to its configured ARM ELF magic.

The entry in binfmt_misc instructs the kernel to use /usr/bin/qemu-arm-static as an interpreter, that is why we had to copy it inside the jail. Remember that by chroot magic /usr/bin is really inside new_root_folder.

qemu-arm-static will interpret the ARM binary in user mode. We are using the static version of qemu-arm because we need the interpreter to be standalone, as it is the only x86 binary in the jail and will not have access to any x86 libraries.

Any ARM library that is expected by the programs inside the jail will be there, as provided by the ARM rootfs.

Let’s see all this in action, opening a bash shell in a Raspbian rootfs

I had to configure the PATH variable to match the one Raspbian expects. Naturally, our original environment from zsh will be inherited by chroot and arm-bash. We have talked about full system QEMU Raspbian emulation before, and this runs so much faster.

Things will work as long as the ARM binaries see what they expect to see. Binaries can execve() other executables and everything will mostly work perfectly well. An exception would be programs that use exotic system calls that QEMU user mode still has not implemented yet, for instance for using the pseudo random generator. As QEMU user mode becomes more mature, it is getting more strange to see this happen and normally libraries have fallback options for these situations anyway. In those cases you will see something like

qemu: Unsupported syscall: 384

Remember that we are still using our host kernel, so we can use networking, install packages with apt and all the rest. This is really useful for things like