Evading Static Analyzers by Solving the Equation (Editor)

Introduction

As part of our efforts to self-evaluate our backend systems, we closely monitor the behavioral reports produced by our dynamic analysis system. Every detection is, in fact, cross-checked and correlated with several other pieces of information, including the output from a number of static analyzers.

A few weeks ago a small anomaly started to creep in when analyzing malicious documents: executions spawning a rogue Equation Editor process (often linked to arbitrary code executions) were no longer triggering our internal static analyzers. It was as if the malicious documents were leveraging a new CVE, possibly just added to a well-maintained document exploit builder (for instance like the old Phantom exploit builder kit, or the Metasploit framework).

One of the malicious documents (sha1: cf63479cefc4984309e97ed71e34a078cbf21d6a) was obfuscated but the process snapshot was still clearly showing the exploitation of the same buffer overflow used by CVE-2017-11882. However, the header of the OLE object (as extracted by rtfobj) was clearly different.

This quickly explained why the static analyzer didn’t assert detection of the known CVE: any string that is often used to detect CVE-2017-11882 relies on either the class name or some other byte sequence that, as shown in Figure 1, is now clearly missing. At this point, we decided to analyze the document in more detail.

OLE Object Analysis

The OLE object (as extracted by RTFScan and viewed by SS viewer) clearly shows that even its stream type is somewhat generic (normally an Equation Editor OLE object contains an

EquationNative stream as further explained here). Instead, the OLE stream is parsed as a more obscure Ole10Native (see Figure 2).

There are two interesting things happening here: (i) Equation Editor is still invoked to process the OLE object regardless of the OLE format, and (ii) Equation Editor is able to parse this new and generic format. As we show in Figure 3, the first is achieved because the CLSID is also specified inside the OLE object itself (the reader can find a nice walk through on how this is done here).

As for the stream itself, its type is not something we see every day. Equation Editor, on the other hand, seems to know this format quite well, and in fact it parses the object without raising any issue: it selectively reads and tests specific bytes (the first and third byte of the MTEF header and the first two of the TYPESIZE header), and if some specific values are found (as shown in Figure 4), Equation Editor is finally convinced to parse the FONT record as well, triggering once again the same buffer overflow that is normally exploited in CVE-2017-11882.

Shellcode Analysis

The vulnerability exploited to execute the shellcode is indeed CVE-2017-11882; as soon as the FONT record is parsed, the control flow is transferred to 0x445203.

At this address, a RET instruction will be executed to transfer control to the shellcode stored in a buffer located in lieu of the FONT record (this exact method of executing a shellcode is also used by CVE-2017-0802 and further explained here):

The shellcode also is using an interesting way to find itself in memory. Unlike other malicious documents exploiting CVE-2017-11882, in our case, the sample does not rely on the

WinExecute API to divert execution. Rather, it searches the OLE stream itself to locate the entry point of the shellcode. To succeed, it needs the following three hardcoded values:

Address 0x0045BD3C : this address references an object that contains a pointer to another temporary structure (see Table 3 in Appendix for more details). This temporary structure points to the beginning Ole10Native stream as loaded in memory.

Address 0x004667B0 : this address points to the imported function GlobalLock .

: this address points to the imported function . 0x11F : the entry point in the shellcode from where it will start executing.

These three values are then used as follows:

First, the shellcode retrieves the handle of the memory object from 0x0045BD3C . Then the handle so retrieved is passed as parameter and used to invoke the GlobalLock API. The pointer returned references the first byte of the OLE stream in memory. The shellcode now knows where it is residing in the memory and starts executing from StartOfShellcode+0x11F .

The sample goes on by downloading a file from hxxp://b.reich[.]io/hnepyp.scr, saving it on disk as name.exe, and executing it. In this report, we omit the analysis of this specific binary, as it is yet another pony variant. Were the reader interested, VirusTotal has a full report here (sha1: 2bcd81a9f077ff3500de9a80b469d34a53d51f4a); all IOCs are also listed in the Appendix, Table 1.

Why Static Analysis is not Enough

While in some cases static analysis can detect if a specific vulnerability is exploited, obfuscated samples often present quite a challenge even for the most sophisticated analyzer. In our case, a simple pattern match is not even possible: the only bits of information we can use to write a detection rule is the CLSID and the 5 bytes that are constant in the MathType OLE object (the OLE object used by Equation Editor).

A hypothetical static checker would need to:

Extract the OLE object from the document Parse the OLE header and check if it is pointing to the Equation Editor CLSID Extract the Ole10Native stream Parse it and get the FONT record Check its actual length And finally, verify that the last four bytes of the buffer corresponds to an address

This is not a trivial task if done statically, and overall impossible if only pattern matching is available (as it is the case if we are using YARA rules, for example). On the other hand, in Figure 6 we can see the full behavioral analysis when analyzing the sample dynamically.

Conclusions

The sample subject of our analysis did not use any new CVEs, but relied on an unexpected new way to deliver the old and well-known CVE-2017-11882. This particular way of delivering the exploit effectively evaded all static analyzers relying on OLE’s static information. As the exploit author managed to remove (intentionally?) all non-binary strings from the exploit data, he considerably raised the bar for a static analyzer to detect this specific exploit.

Having said that, Microsoft has already issued advisory addressing this specific CVE, so previous mitigations are effective and still apply:

In conclusion, we verified whether MathType v7 (the successor of Equation Editor) was vulnerable to this specific parsing quirk when opening a Ole10Native stream, but we are glad to report that both mitigations DEP and ASLR are enabled, thereby protecting the binary from the aforementioned vulnerabilities.

Appendix

Indicator Of Compromise Description cf63479cefc4984309e97ed71e34a078cbf21d6a SHA1 malicious document 2bcd81a9f077ff3500de9a80b469d34a53d51f4a SHA1 loki payload hxxp://b.reich[.]io/hnepyp.scr URL loki payload

Table 1: IoCs discussed in the blogpost.

Offset Size (bytes) Description Value Comment 0 1 MTEF Version 0x2 Version 2 1 1 Generating Platform 0x8 Garbage 2 1 Generating Product 0x1 1 for Equation Editor 3 1 Product Version 0xB9 Garbage 4 1 Product Subversion 0xC9 Garbage

Table 2: Ole10Native MTEF header.

Offset Size (bytes) Description 0x0 4 Handle to the memory object storing the Ole10Native stream in memory 0x4 4 Size in memory 0x8 4 Size in memory 0x10 4 Index of the byte which will be read next from the stream 0x14 4 Unknown

Table 3: Temporary Structure Format.