Last week, on August 10, a security researcher who goes by the handle "zerosum0x0" posted an interesting image to Twitter, it was the code behind a debug build of an executable.

The code was 'Hello World' – the training example used to teach new coders. When the executable was submitted to VirusTotal, several firms flagged it as a problem.

Salted Hash wanted to learn why training code was deemed malicious, so we asked the vendors to explain it. Here's what we learned.

The buzz around machine learning and artificial intelligence has grown over the last year or so in the security world. The vendors leveraging it are doing what they can to cash-in and improve performance to that they can be the undisputed champions of the market.

The experiment by "zerosum0x0" gained attention because the vendors flagging the code are advanced defense systems that promote their usage of machine learning.

The code in question can be seen in images on Twitter. Again, this is training code, something all novice coders use. Why then is such basic code being flagged as suspicious, harmful, or outright malicious by notable vendors such as Cylance, Sophos, McAfee, SentinelOne, CrowdStrike, and Endgame?

The example by "zerosum0x0" was just a single test with seven detections. Others, including one by a user under the handle "_hugsy_" removed the 'printf' function and was still flagged. Only this time, "_hugsy_" had eleven more vendors report that "Hello World" either unsafe, malicious, or a Trojan. Others did the same test, with the similar results.

Salted Hash asked "zerosum0x0" if they attempted to test with a non-debug sample, to which they gave an affirmative, but noted that it was hit or miss. "Theoretically machine learning can extrapolate more info from a debug build."

Again. Why? Why are these advanced offerings, from well-known and established security vendors flagging such basic, harmless code as malicious?

As it turns out, for some that's exactly what's supposed to happen. For others, it's because VirusTotal doesn't use the whole product. The default is to flag as suspicious it seems.

Before we get to the vendor explanations, it's worth noting that VirusTotal has always maintained that it's not the right tool to perform comparative analysis on security products. That isn't the point of this article, but we were curious about the results posted.

Salted Hash reached out to the vendors that flagged "Hello World" in some way including, Cylance, Sophos, McAfee, SentinelOne, CrowdStrike, Cyren and their consumer product F-Prot, Endgame, F-Secure, and Bitdefender.

We asked for comments to explain why "Hello World" was flagged, and asked for details on what they're doing to keep false positive rates low for customers. All but three vendors responded by deadline.

McAfee, F-Secure, and CrowdStrike did not respond to the initial requests for comment. When this fact was mentioned in public, F-Secure and McAfee reached out to us directly, but didn't provide comment by the time this story went live. Update: After this article was published CrowdStrike, F-Secure, and McAfee responded to questions.

Below are the comments from the vendors who responded. Some of their answers have been edited for space.

Ryan Permeh, Cylance:

"The Cylance engine is not an antivirus engine. Unlike AV, it doesn’t have a bias toward letting everything run. The technology doesn't assume a file is good until it’s evaluated. Our approach is to measure and decide on each and every file individually, and if it doesn't fit into our model of good, it leans towards bad. "Without a bunch of data to base a decision on, and without any real patterns of goodness to identify it as such, the engine leaned heavily on the structural bits that are odd and drew a line towards bad in this case. "When we train models, we train on hundreds of millions of good and hundreds of millions of bad files (samples). We look at several million potential data points (features) in each file... "...In general, a piece of code can become "bad" by doing things that lean towards bad. But it can also lean towards bad by not doing things that lean towards good. So in the most basic example provided (hello world in debug build): "The sample was small. It didn't show any bad, but it didn't show any good either; One function programs are almost always malware; Debug builds are statistically weird; Using mingw rather than visual studio is statistically weird. The output binary is 'odd.'"



Hyrum Anderson, Endgame:

"Before Twitter caught ablaze with these “hello world” samples, our own internal research indicated that our and other models were susceptible to these toy samples. Let’s explain why. "Endgame’s machine learning malware detection uses static features to determine before a customer executes a file whether it is likely malicious or benign. The machine learning model is an imperfect summarization of tens of millions of malicious and benign software on which the model was trained. "As an imperfect model, it can obviously be wrong, but still extremely useful in detecting never before seen malware, far more useful than approaches which rely on signatures for already known malware families. "For the case of our model and other machine learning models based on static features, the model can be wrong in this case because, in the training dataset, the model has seen: "Lots of real malware samples that are small unsigned binaries; lots of real malware samples where the entry point (.text) section is small, like droppers unpacking stubs; lots of real malware samples that attempt to hide their imports from static analysis by some method, so that their import table looks very small. "On the contrary, there are very few “useful” benign files that are small, certainly too few to contradict the above experience. "It’s important to note that machine learning is actually quite good for prevention and detection malware, both novel samples and the more well known. Endgame was one of the only few to get NotPetya in VirusTotal, for example. That said, all machine learning models have blind spots (false negatives) and they can mistakenly call things bad (false positives). In fact, we’ve shown in our published research that for some machine learning models, these vulnerabilities can be quite convenient to exploit... "...At Endgame, we employ a strategy of layered protections that align with a large number of commonly seen attacker actions. Our MalwareScore engine (released standalone in VirusTotal) represents only a single slice of that layered protection paradigm. The layers work in concert to alert our customers of potential threats (reducing FNs), and working together to build a complete story of a potential threat (reducing FPs). "Fortunately, the samples highlighted on Twitter are interesting corner cases, but are extremely esoteric for our customer base. Nevertheless, we continually are doing more research to improve our detection ratio and reduce our false positive rate. This involves data gathering to increase our model’s understanding of the universe of benign and malicious software as well as a huge amount of experimentation effort to maximize our model’s performance. We put a great amount of attention on addressing known false positives seen by our customers. As a result of these efforts, we regularly release models to our customers and to VirusTotal. And, we continue to work with 3rd parties to validate our model’s performance on real files."



Dr. Sven Krasser, CrowdStrike: