The Science and Art of File Reputation

Editor’s Note: The following blog post is a summary of a RFUN 2017 customer presentation featuring Igor Lasic, vice president of technology, at ReversingLabs.

Key Takeaways Two malware attacks in 2017, CCleaner and NotPetya, show just how vulnerable even the biggest companies can be to data theft and corruption.

No defense is impregnable: the best cybersecurity often involves the quickest response to mitigate damage as fast as possible.

Quick responses mean intelligent, automated data analysis, which takes two things: large data sets and smart algorithms.

The sprawling, patchwork landscape of the internet today in many ways resembles the Old West: it’s a treacherous territory full of villains and thieves as much as it is a frontier rich with opportunity and treasure. Attacks can come from aggressive competitors, cynical robbers looking to make a quick fortune, political activists, hacktivists, or enemy states; sometimes, they come from agents of pure chaos.

The best defense in these times is to know your enemy. And in an industry where zero-day vulnerabilities are possible, quick, accurate analysis and response is essential for surviving a malware attack.

In a presentation given at Recorded Future’s sixth annual threat intelligence conference in Washington, D.C., Igor Lasic, vice president of technology, who leads technology initiatives at ReversingLabs, covered two major malware attacks that occurred in 2017 and showed how ReversingLabs responded to them. The company produces malware analysis software that uses a proprietary hash function for more effective encryption, among other things. ReversingLabs has partnered with Recorded Future to increase the technical sourcing for Recorded Future.

Background of the Attacks

During his presentation, Lasic looked at the CCleaner and NotPetya malware attacks as case studies demonstrating how ReversingLabs’ software identifies malware.

CCleaner is a popular free program meant to optimize PCs in various ways, like by deleting unnecessary files and erasing browser history and cookies. In September 2017, it was discovered that hackers had written malicious code into the official download of CCleaner offered on the servers run by Avast, the company that produces CCleaner. The infected version of the program collected a broad range of information from the computers it was installed to, and it eventually became apparent that the attack specifically targeted a group of high-profile technology companies, including Intel, Google, Microsoft, Samsung, and Sony. The malware was active for over a month before CCleaner was updated, infecting nearly 2.3 million computers.

Petya was another malware attack that caused a lot of havoc. In early 2016, infected computers would reboot and pretend to run CHKDSK, a system tool verifying the integrity of the computer’s system files, but in reality encrypted the hard drive and overwrote the master boot record. When the computer was restarted, the malware code would demand the user pay a ransom fee in Bitcoin to unlock their files.

Things got stranger earlier in 2017 when a new program masquerading as Petya spread worldwide. It quickly became apparent that this new program, dubbed NotPetya for its superficial resemblance to Petya, did not have mechanisms in place to collect ransom from users in exchange for decryption keys, and the decryption keys offered mostly did not work. It was not ransomware, but malware meant to move through a network as quickly as possible and destroy data. Further complicating the issue, some antivirus software misidentified it as various other forms of malware or exploits.

ReversingLabs on the Hunt

Attacks like these two are insidious. According to the experts at ReversingLabs and elsewhere, the sophistication of the CCleaner and NotPetya attacks indicates that they were most likely state sponsored. Hackers rely on code taken from other malware; they use digital certificates and exploits that give the appearance of legitimacy. This leaves behind a messy and inconsistent data set, making future detection and prevention difficult.

Lasic showed that ReversingLabs goes on the hunt in a few different ways. First, they process millions of files a day, giving them an enormous knowledge base to draw on. Next, every file gets both static and dynamic analysis. According to Lasic, combining these two methods allows them to rapidly reduce the data set to something easier to work with. Finally, ReversingLabs manually looks at file similarity between the samples, verifying that its algorithms are actually finding the right samples.

What are static and dynamic analysis, and how is ReversingLabs doing them? In short, static analysis means the code is examined without actually running the program. An automated tool will look over the code to find any mistakes, backdoors, or malicious functions, in part by comparing it to known code. Because static analysis is essentially an analysis of text, the process is extremely fast, and because it occurs without the program being executed, it is safe. Dynamic analysis, on the other hand, occurs when the program is tested while it is being run, evaluating how it actually behaves and interacts with other software, which makes it more comprehensive than static analysis in some ways.

The software developed by ReversingLabs is particularly effective at static analysis. Its TitaniumCore software recursively unpacks internal objects for a wide variety of operating systems and supports over 3,500 file types, and its scalability allows it to analyze and reconstruct millions of files daily through a single server while still detecting and analyzing threats in milliseconds.

TitaniumCore produces detailed reports in various formats, including indicators on format, format validation strings, sections, certificate chains, code similarities, and malware family tags, that allow for further analysis.

TitaniumCore also uses a proprietary hash function called RHA: ReversingLabs Hashing Algorithm. Lasic explains that ReversingLabs has developed its own function analysis algorithm, which allows them to decompose a file and identify its traits and then make hashes out of it, creating groupings of similar files and allowing them to more easily identify polymorphic threats like Petya and its relatives.

It’s worth quickly addressing what a hash function is to fully understand why ReversingLabs chose to develop its own format. Hash functions take an input — which can be any set of data, including whole files — and output a value of fixed size, which is referred to as a hash and is generally displayed in hexadecimal code. The hash function is meant to be irreversible, meaning that although the data inputted into the function and its outputted hash have a one-to-one association, so the hash can be used to identify the data, the input cannot be derived from the hash. These features — being much smaller than the data they are associated with, being unique to each set of data inputted into a hash function, and being difficult to reverse — make hashes useful for data encryption, authentication, indexing, and detecting unique or duplicated files.

Hash algorithms are not truly irreversible, however. For example, SHA-1 (Secure Hash Algorithm 1), which was originally developed by the National Security Agency and was once widely used as an encryption method for authentication certificates, is not presently considered secure enough for ongoing use, with organizations like Google, Microsoft, Apple, and Mozilla no longer accepting SHA-1 certificates on their browsers as of 2017.

ReversingLabs’ hash algorithm allows them to index and compare data more securely and efficiently than some other standards.

In addition, ReversingLabs TitaniumCloud Threat Intelligence service currently contains over 4 billion goodware samples and 1 billion malware samples, and is constantly being updated by another 6 millions samples daily, says Lasic.

By combining the TitaniumCore static analysis, RHA, and TitaniumCloud file reputation service, ReversingLabs users are able to more effectively hunt for threats.

The above methods allow ReversingLabs to process larger amounts of data securely while iteratively improving its analysis, leading to higher-quality end data, says Lasic. This data is then added to Recorded Future Intelligence Cards™ in real time for faster analysis.

Case Studies: CCleaner and NotPetya

Next, Lasic discussed how ReversingLabs responded to the threats of CCleaner and NotPetya.

First, to search out the infected versions of CCleaner, Lasic explained that they focused on the authentication certificate that the program was signed with.

In short, a certificate is a digital technology meant to secure a file by digitally signing it with a certificate. Generally, when presented with a certificate, an operating system such as Microsoft Windows or a program like Word can check the following criteria: Has the certificate been issued by a trusted certificate authority, following the proper formatting? Is the certificate expired?

The certificate is like a driver’s license proving you are who you say you are. The first check the program makes — ensuring the certificate has been issued by a trusted authority — is like a police officer checking whether your license is fake. The next check is simple — whether the license has expired.

The problem with the CCleaner attack is that the malware came as part of the legitimate version of the program being offered through official servers, meaning the authentication certificate it presented was genuine. The makers of CCleaner eventually revoked that certificate, but the damage was already done.

ReversingLabs focused on this certificate to track the infected version of CCleaner (and similar samples) to narrow down its dataset. According to Lasic, the company then refined its results by comparing the file size and matching the timeline of its data set with the attack; and as a result, ReversingLabs was one of the first sources of accurate CCleaner first stage file hashes.

Through its analysis, ReversingLabs was subsequently able to identify the second stage CCleaner malware and find examples of those files in its sample collection. Moreover, these files did not appear in public malware repositories such as VirusTotal.

To track NotPetya, ReversingLabs started with hundreds of thousands of samples and was able to cull that set down to just 18 examples, with no false positives; this is significant because initially there was a lot of confusion with misclassification of samples due to code similarities of NotPetya with Petya and other commodity malware.

With this “pure” set of NotPetya samples, ReversingLabs was able to create solutions to consistently detect it. The specific NotPetya markers discovered led to YARA rules on crypto routines, main file encryption and encryption loop, and most notably, the NotPetya shutdown call. ReversingLabs also observed that the malware included use of expired Microsoft certificates to mimic legitimate application behavior. Finally, using ReversingLabs’ proprietary RHA algorithm, they were able to identify other similarly functional samples, which further analysis proved to reveal functional NotPetya variants.

A Swift Response Requires Balance

Lasic noted that it can often take days, if not weeks, for antivirus vendors to accurately identify malware and prevent its spread. No network is totally secure, so the most effective defense against malware often comes down to being able to respond as quickly as possible. According to Lasic, the speed and effectiveness of a response is predicated on having a large enough data set — but the larger the data set, the more difficult it becomes to analyze it efficiently. Thus, a fast response also depends on good algorithms that strike a balance between security, swiftness, and comprehensiveness.

To get started with malware detection and response, it helps to have the most recent data. The Recorded Future Cyber Daily email provides daily updates on the top results for trending technical indicators such as malware, active threat actors, the most targeted industries, and more.