Originally published for Windows, the “Heaven’s Gate” technique allowed malicious software to evade endpoint security products by invoking 64-bit code in 32-bit processes, effectively bypassing user-mode hooks. This technique has since been mitigated in Windows 10+ through Control Flow Guard (CFG).

Red Canary has successfully reproduced a variation of this technique for Linux, and the result of our research has been incorporated into Chain Reactor, our open source framework for adversary simulation on Linux.

In the sections below, we will break down the Heaven’s Gate technique as a primer, detail what a variation of this technique looks like for Linux, provide proof-of-concept code, and demonstrate how strace / ptrace can be fooled.

Revisiting Heaven’s Gate for Windows

Endpoint security products commonly instrumented Windows applications through user-mode API hooks. For example, to monitor the file activity of a newly launched application, security products would hook the associated 32-bit APIs in ntdll.dll, like CreateFile, OpenFile, and WriteFile.

However, Windows on Windows (WoW64), which allows users to run 32-bit applications on 64-bit systems for compatibility reasons, introduces a small wrinkle. In a WoW64 system, all 32-bit applications load both a 32-bit and 64-bit ntdll.dll. The 32-bit DLL effectively acts as a wrapper, thunking into the 64-bit DLL to perform the actual system call.

As a result, malicious software could bypass the user-mode components of many security products by directly utilizing the 64-bit API, bypassing the 32-bit API hook.

This technique was originally documented by George Nicolaou on his blog and has since been mitigated in Windows 10+ through CFG.

Heaven’s Gate on Linux

We were able to recreate a variation of this technique for Linux, effectively confusing common tracing tools. Unlike Windows, 32-bit Linux executables do not load any native 64-bit libraries. Instead, Linux creates 32-bit kernel entry points that emulate the existence of a 32-bit kernel. In contrast, Windows only has 64-bit kernel entry points.

Instrumentation for security products will likely exist in one of a few places: a ptrace -based system, in the kernel, or the utilization of a subsystem like audit.

We will cover two use cases:

64-bit applications calling 32-bit syscall entries

32-bit applications calling 64-bit syscall entries

These are detailed at length below, with proofs of concept (POC) and raw code in our GitHub repository.

Evading 64-bit security instrumentation

It turns out 64-bit applications can directly invoke the standard 32-bit interrupt handler to transition into the kernel. Some caveats exist though, as all register-based arguments must be no larger than their 32-bit counterpart, and all pointers must point to the lower 2GB of virtual memory.

Fortunately, these constraints can be met via mmap with the MAP_32BIT flag:

Let’s walk through an example using the 32-bit socketcall interface:

Prior to Linux 4.3, 32-bit applications used a single entry point ( socketcall ) for all socket operations. The first parameter ( call ) specifies the action to perform ( bind , connect , etc.,) and the second parameter is a contiguous memory block for the action’s expected arguments.

In this example, we’ll have socketcall perform a recvfrom action.

Function definition:

Argument details:

Three immediates ( sockfd , len , flags )

, , ) One out ( buf )

) One in/out ( addrlen )

) One pointer to a structure

When exercising the Heaven’s Gate technique, immediate values are the least complex, as the semantics are pass-by-value. Input/output pointers are more complex because they both need to point into the lower 2GB of virtual memory. We also need to ensure enough memory is allocated via mmap , for the socket operation’s arguments (6 in this case), the receive buffer (specified by len ), the src_addr , and an out for addrlen .

Once the arguments have been laid out correctly, we can invoke socketcall ’s 32-bit syscall entry point through its system call number (102).

Proof of concept:

Evading 32-bit security instrumentation

Invoking the 64-bit syscall interface is possible, but it requires a particular code segment selector being set. Currently, for both 32-bit protected and 64-bit long mode, the code segment is an index into either the Local Descriptor Table (LDT) or Global Descriptor Table (GDT), pointing to a structure that defines the limits of virtual address space for user-mode and kernel mode, as well as the defined instruction encoding.

Index values for the code segment (CS) are defined in the x86 architecture folder, inside the Linux source repository.

As we can see from above:

32-bit applications are assigned a value of 35 ( 0x23 ) via __USER32_CS

) via 64-bit applications are assigned a value of 51 ( 0x33 ) via __USER_CS

As a result, we need to transition our application’s CS from 0x23 to 0x33 in order to execute the 64-bit syscall without an illegal instruction exception. We can achieve this by utilizing a far jump , which allows us to specify the segment selector and the virtual address of our target function.

Proof of concept:

Impact

It’s easy to confuse user-mode instrumentation and tooling. For example, strace utilizes ptrace to introspect a process’s syscalls during execution. ptrace doesn’t distinguish between 32-bit and 64-bit syscall entry points when operating in 64-bit long mode. As a result, strace incorrectly decodes the syscalls when we utilize the Heaven’s Gate technique:

As illustrated above, strace ’s output is informing us that lgetxattr is being called, when in fact it is mmap .

It is unknown which Linux security products are impacted by this technique. The only prior art we could find after our research was a GitHub Gist with some POC code by “rqou.”

Impacted security product categories could include:

Antivirus

Next-gen antivirus (NGAV)

Endpoint detection and response (EDR)

Endpoint protection platforms (EPP)

Cloud workload protection platforms (CWPP)

Sandbox and deception technologies

We encourage you to explore if you’re at risk as a customer or as a vendor. POC code is provided in our GitHub and incorporated in Chain Reactor.