Evaluation of Static Features for Mach-O Sample in Classification Task

Mach-O is the format for executable file of Mac OS X. With the increase of market share of Mac OS X, the malware for Mac OS X also recorded an unprecedented growth in the past few years. In this presentation, we present a study of classifying Mac OS X malware with a set of features extracted from Mach-O metadata and its derivatives on our samples collected from VirusTotal during late 2014 and early 2016. There are some prior researches that attempt to classify PE executable files by the metadata extracted from PE files. Similar to the PE format, Mach-O format also provides a variety of features for classification. We collected all the Mach-O samples submitted to VirusTotal during late 2014 and early 2016. After removing files that are not compiled for i386 or X86_64, we extracted metadata from the collected Mach-O samples. Meta information from sample files, such as segment and section structures, dynamic libraries, etc., are used as features for classifying Mac OS X samples. In order to understand the effectiveness of these features, we divided our sample collection into two parts. Samples collected during Sept 2014 and Oct 2015 are used as training samples, and samples after Oct 2015 are used as testing samples.

This study summarizes the statistical changes in view of Mac OS X malware families, and the structure trending between benign and malicious samples between late 2014 and early 2016. With our collection of more than 600,000 samples and over 4,000 malicious samples, our feature evaluation is based on composition analysis of different malware families in both aspects of meta and derivative features. This work uses a variety of classification algorithms to generate predictive models with the training dataset, and to analyze the results with testing samples and their difference from AV vendors' detections on VirusTotal. We also discuss the effectiveness of selected features, and observations on our sample collection.