A researcher at OpenDNS Security Labs has developed a new way to automatically detect and block sites used to distribute malware almost instantaneously without having to scan them. The approach, initially developed by researcher Jeremiah O'Connor, uses natural language processing and other analytics to detect malicious domains before they can attack by spotting host names that are designed as camouflage. Called NLPRank, it spots DNS requests for sites that have names similar to legitimate sites, but with IP addresses that are outside the expected address blocks and other related data that hints at sketchiness.

The practice of using look-alike domain names as part of an effort to fool victims into visiting websites or approving downloads is a well-worn approach in computer crime. But recent crafted attacks via "phishing" links in e-mails and social media have gone past the well-worn "typo-squatting" approach by using domain names that appear close to those of trusted sites, registered just in time for attacks to fly under reputation-scoring security tools to make blacklisting them harder. Fake domain names such as update-java.net and adobe-update.net, for example, were used in the recently discovered "Carbanak" attacks on banks that allowed criminals to gain access to financial institutions' networks starting in January 2013 and steal over $1 billion over the next two years.

Many security services can screen out malicious sites based on techniques such as reputation analysis—checking a centralized database to see if a site name has been associated with any malware attacks. But because attackers are able to rapidly register new domains with scripted systems that look relatively legitimate to the average computer user, they can often bypass reputation checks—especially when using their specially crafted domain names in highly targeted attacks.

O'Connor's approach, which is currently being tested by OpenDNS using live DNS query traffic, gets around the reputation problem by simply analyzing the domain name itself for sketchiness. It works in a way similar to natural language processing of any stream of text content. Using patterns spotted in malicious DNS traffic, OpenDNS security researchers are training the NLPRank system to identify domain names that look similar to legitimate sites but have attributes that flag them as being suspicious.

"Essentially what we are defining is a 'malicious language' within the lexical nature of DNS traffic," O'Connor wrote in a blog post being published this morning. The "language" consists of domain names that are combinations of technology company-related text (such as "java," "gmail," "facebook," or "adobe," for example with a collection of "certain dictionary words," O'Connor explained ("install," "update," "security," or "payment," for instance).

The system then performs "sentiment analysis" on frequently queried domain names in tens of billions of DNS requests that flow through OpenDNS daily, looking for patterns like these, applying a set of ranking scores to domain names that match the pattern. "If it's a Facebook-related domain and not associated with Facebook's IP address space, that would be a negative tick," said Andrew Hay, director of security research at OpenDNS, in an interview with Ars. "Or if it was registered a day ago and administered by someone with a Russian disposable e-mail address, those would be negatives." And the system can also do HTML analysis of websites associated with the domain names to check if there's a match. "We can look at fraud websites and compare them to actual legitimate pages, see how much they differ," Hay explained.

Hay said that OpenDNS is currently fine-tuning the system to prevent false positives, but that so far NLPRank has held up well in testing. "We have used it to detect malicious phishing campaigns," he said. "And we've been able to use it to validate data in other security firms' reports, giving us additional reinforcement that it's working."