Know Thy Enemy

This was a white box competition; meaning I had full access to all model parameters and source code. Therefore, the first thing to do was crack open the models and see what was going on under the hood.

MalConv

The first model is a neural network trained on the raw bytes of Windows executables. MalConv is implemented in PyTorch, and if you’re already familiar with neural networks the code is relatively simple and straight forward:

Files are passed to MalConv as a sequence of integers representing the bytes of the file (0–255). An embedding layer inside MalConv maps each byte to a vector of numbers. The sequence of vectors can then be processed by additional neural network layers. The model outputs two numbers representing the probabilities of an input being benign and malicious.

There’s already a decent amount of literature out there on evading MalConv. The easiest attack is to just append a bunch of good looking stuff to the end of the executable. This is a particularly nice trick because the added data (also called the overlay) doesn’t get loaded into memory when the malware is executed. Therefore, we can put whatever we want in the overlay without changing the functionality of the file. MalConv looks for both benign and malicious byte patterns in order to make a decision. The purpose of the overlay attack is to overwhelm it with patterns associated with benign files.

Non-Negative MalConv

The second model is actually the same as the first but with different weights assigned to the layers. As the name suggests, Non-Negative MalConv was constrained during training to have non-negative weight matrices. The point of doing this is to prevent trivial attacks like those created against MalConv. When done properly, the non-negative weights make binary classifiers monotonic; meaning that the addition of new content can only increase the malicious score. This would make evading the model very difficult, because most evasion attacks do require adding content to the file. Fortunately for me, this implementation of Non-Negative MalConv has a subtle but critical flaw.

The non-negative defense only works for binary classifiers with a single output score representing the maliciousness of the sample. This version breaks the output into two scores: one for malicious and one for benign. Then, a softmax function converts the scores to probabilities for each class. This construction makes training with non-negative weights meaningless. Additional content can still push the benign score arbitrarily high. As the benign score goes higher, the softmax function will push the malicious score lower, even though the same amount of malicious content is present. So all of the same attacks against MalConv will work here as well.

If you’re interested in a bit more mathematical detail, check out Appendix A of this paper. The authors prove that a softmax network with non-negative weights is essentially equivalent to an unconstrained network in regards to what it can learn. This is why for a proper defense it’s necessary to use a single output; we don’t want the network to learn to look for benign content.

Excerpt from paper: http://ci.louisville.edu/zurada/publications/chor-zur-tnnls.pdf

Ember

Ember is actually a data set maintained by Endgame. Along with the data set comes a trained benchmark model (which I’ll also call Ember). Ember is a lightgbm model (an ensemble of boosted decision trees) trained on several features parsed from Windows PE files.

The source code for Ember’s feature parsing is located here. Seeing how all of the features are parsed was quite useful for crafting attacks. The extracted features include:

Byte Histogram

Byte Entropy

Section Information (names, sizes, entropy, properties)

Import Table Libraries and Entries

Exported Functions

General File Info (sizes and counts of various things)

Header Info (machine, architecture, linker, versions, etc.)

Strings (various statistics about strings in the file)

Data Directories (names, sizes, virtual addresses)

The numerical features are used as is, while the hashing trick is used to convert the rest of the features (like section names) to numerical vectors. At first glance, Ember looks more difficult to fool. Many of the features are related to the structure of the file or to characteristics which we won’t be able to change. Remember, we have to evade detection while maintaining the original functionality!

Ember’s vulnerability is that some of the features can be arbitrarily controlled by an attacker. The lightgbm model uses these features in multiple locations throughout the ensemble of trees. We can take advantage of the complexity of the model by manipulating the features in order to drive the decision down paths which result in evasion.