Hey everyone, today we're doing something different. This is going to be a joint blog post from Ethan Dodge (@__eth0) and I in which we talk about phishing defense coverage by the Alexa Top 100 domains, which will also expose the best attack vectors for phishing against these domains.

We're going to be using a combination of the new DNS reconnaissance tool DNStwist as well as some custom Python scripts to gather and analyze all the information we find, which we'll include in this post if you want to follow along or do your own research.

Overview

Here's a rundown of what we'll be doing to get all the information we need. We'll start by pulling down the Alexa Top 100 Domains, then we'll create a script to run them through a modified version of DNStwist to give us the permutated domain as well as the type of permutation (bitsquatting, Insertion, Omission, Replacement, etc.). We'll take this list and then do a host lookup of the domain to get the IP addresses hosting this domain, lastly we'll do a Whois lookup and Reverse DNS lookup on the IP we get and compare the Registrar/Pointer Record information against the domain to see if they match up.

After we have the comparison data, we'll be able to calculate what types of permutations are the most covered against attacks (meaning the original domain registered the permutated domain to possibly prevent phishing attempts), as well as the types of permutations that are least covered.

Grabbing the Data

First thing we need to do is get the Alexa Top One Million sites and just narrow that down to our scope of 100.

wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Then we'll just cut that down to 100 domains.

cat top-1m.csv | awk -F ',' {'print $2'} | head -n 100 > alexatop100.txt

This will give us something that looks like this:

Getting Permutations of Domains

Starting out, we'll need to use DNStwist to get a list of permutations. We went ahead and modified the original script to not print out extra information we don't need, we only wanted the type of permutation and the resulting domain of that permutation.

Here's a screenshot of the resulting script being run using google.com as an example.

If you want the modified version of dnstwist we used, you can grab it here.

Now that we have our list of domains, we'll use this bash one-liner to loop through each of the domains, and then run that through our modified dnstwist and output the results into it's own file in a new directory:

while read domain; do python dnstwist.py $domain > ~/Desktop/alexatop100/$domain; done < ~/Desktop/alexatop100.txt

Running the above takes about 5 seconds and gives us a directory looking like this:

Host Lookup

Next thing on our plate is to do a host lookup on the resulting permutated domains. We want to end up with a text file containing the permutated domain, permutation type, and IP Address if valid and a string of "NXDOMAIN" if it's not a valid IP Address.

Here's the bash one-liner we used to look through all the permutations for each domain and run the host lookup on each, and then add it to a new file for future analysis.

for file in *; do python hostlookup.py $file; done

After letting the above command run for about 30 minutes or so, we're left with a directory that looks like this:

You'll notice the directory now has another 100 files appended with \_hostlookup . Inside each file, we see each permutated domain with the IP Address it resolved to.

Reverse DNS Lookup

Initially we were going to run a Reverse DNS lookup against the IP's to see what Pointer records they had, however, we thought doing a Whois lookup would be higher integrity. In any event, here's the steps we did to run a Reverse DNS lookup on all the domains and permutations.

Next we wrote a Python script to grab the pointer record that was returned from each permutation. It included the domain, IP, permutation type, the pointer record and a True or False statement depending on if the original hostname was seen in the hostname we grabbed.

Running the above script in another bash one-liner like this, for file in *; do python rdnslookup.py $file; done , we get another 100 files in our directory that are prepended with _rdns .

Inside each file we can see the resulting Pointer record and the True or False string.

Moving on to the Whois lookup...

Whois Lookup

The last part we need to code before we can analyze the data is the Whois lookup on all the IP Addresses we grabbed from the host lookup step.

With this part, we just want to grab the description field of the Whois info, which should tell us the company that owns that IP Address. Between the Whois and Reverse DNS lookups, we should be able to determine if the owner of the IP Address matches the permutated domain.

Now that we have our final data, which was two different sources (Whois and Reverse DNS), we can now run some statistics on this data to answer some of the questions we asked earlier in the post. First things first though, we'll need to get Splunk set up to ingest the data.

Splunk Setup

In order for Splunk to recognize the fields, we'll configure the props.conf file in /opt/splunk/etc/system/local/ with the following settings:

[phishing] REPORT-phishing = REPORT-phishing [whois] REPORT-whois = REPORT-whois

Next, we edit the transforms.conf file in /opt/splunk/etc/system/local/ like so:

[REPORT-phishing] DELIMS = " " FIELDS = "domain","ip","perm_type","hostname","is_match" [REPORT-whois] DELIMS = " " FIELDS = "domain","ip","perm_type","owner","is_match"

That's all that needs to be done in order to parse the events. Which will look something like this now:

Before We Begin

Before we get into the specific results between Whois and Reverse DNS, it'll help if we identify the different types of permutations, provided by Lenny Zeltser on his blog:

Bitsquatting, which anticipates a small portion of systems encountering hardware errors, resulting in the mutation of the resolved domain name by 1 bit. (e.g., xeltser.com).

Homoglyph, which replaces a letter in the domain name with letters that look similar (e.g., ze1tser.com).

Repetition, which repeats one of the letters in the domain name (e.g., zeltsser.com).

Transposition, which swaps two letters within the domain name (e.g., zelster.com).

Replacement, which replaces one of the letters in the domain name, perhaps with a letter in proximity of the original letter on the keyboard (e.g, zektser.com).

Ommission, which removes one of the letters from the domain name (e.g., zelser.com).

Insertion, which inserts a letter into the domain name (e.g., zerltser.com).

Whois Analysis

Ok, let's jump into the data! We'll start with the analysis of the whois data.

Below is a list of the top permutation types registered

sourcetype=whois | top perm_type

Permutation Type Count Replacement 2002 Insertion 1849 Bitsquatting 1347 Omission 454 Repetition 400 Transposition 347 Homoglyph 335 Concourse 313 Subdomain 193 Hyphenation 146

Alright, out of all the registered domains, how many permutated domains are potentially registered by the original domain owner?

sourcetype=whois is_match=true | stats count

Out of all the domains registered, according to our unvetted data, there are only 460 domains registered by the original domain owner.

Now, let's see the permutation type protected against the most.

Permutation Type Count Insertion 146 Replacement 130 Bitsquatting 53 Repetition 41 Omission 35 Transposition 23 Homoglyph 19 Hyphenation 11 Subdomain 2

Out of all the Insertion permutated domains, let's identify the domains that are protected the most:

sourcetype=whois is_match=true perm_type="Insertion" | rex field=source "\/tmp\/(?<original_domain>[^_]+)"| top original_domain

Domain Count amazon.com 29 microsoft.com 28 booking.com 26 amazon.co.uk 25 yahoo.com 15 amazon.in 9 netflix.com 7 wikipedia.org 2 yandex.ru 1 msn.com 1

Now, let's just do the most protected domains regardless of permutation:

sourcetype=whois is_match=true | rex field=source "\/tmp\/(?<original_domain>[^_]+)"| top original_domain

Domain Count amazon.com 82 amazon.co.uk 63 microsoft.com 62 booking.com 55 amazon.in 54 yahoo.com 44 netflix.com 29 bing.com 17 apple.com 10 wikipedia.org 6

Looks like Amazon is the most concerned about someone ripping off their domain :)

Reverse DNS Analysis

Alright, let's move onto Reverse DNS Analysis.

Let's get the most common permutation types registered.

sourcetype=phishing | top perm_type

Permutation Type Count Replacement 1163 Insertion 1025 Bitsquatting 796 Omission 236 Repetition 227 Homoglyph 203 Transposition 194 Subdomain 98 Hyphenation 95

Alright, out of all the registered domains, how many permutated domains are registered by the original domain owner?

sourcetype=phishing is_match=true | stats count

Out of all the domains registered, according to our unvetted data, there are only 381 domains registered by the original domain owner.

Now, let's see the permutation type protected against the most.

Permutation Type Count Insertion 114 Replacement 108 Bitsquatting 47 Repetition 32 Omission 27 Transposition 25 Homoglyph 17 Hyphenation 10 Subdomain 1

Out of all the Insertion permutated domains, let's identify the domains that are protected the most:

sourcetype=phishing is_match=true perm_type="Insertion" | rex field=source "\/tmp\/(?<original_domain>[^_]+)" | top original_domain

Domain Count amazon.com 29 booking.com 26 amazon.co.uk 25 yahoo.com 17 amazon.in 9 netflix.com 4 yandex.ru 1 wikipedia.org 1 msn.com 1 blogspot.com 1

Now, let's just do the most protected domains regardless of permutation:

sourcetype=phishing is_match=true | rex field=source "\/tmp\/(?<original_domain>[^_]+)" | top original_domain

Domain Count amazon.com 82 amazon.co.uk 63 yahoo.com 56 booking.com 55 amazon.in 54 netflix.com 19 bing.com 12 google.de 8 google.com 7 wikipedia.org 5

Just like the Whois data, Amazon takes the cake for most permutated domains registered. It makes sense though, since that would be a great site to phish for credentials.

DDoS Protection Sites

We knew that we would have some incorrect data, since not every domain registered would point exactly to the owner, i.e. wikipedia.com is owned by Wikimedia, thus we wouldn't count that as true.

One of the big things we noticed is the amount of domains resolving to prolexic.com, which is a DDoS protection site, which is what companies would use in order to prevent DDoS attempts on their sites (...obviously). We doubted that phishing domains and/or malicious actors would enlist the help of a DDoS protection service, since it probably costs a lot of money based on traffic. Based on this fact, we are going to count prolexic.com hits as true and see what kind of results we get then.

Let's rerun some of the original searches now...

First, how many permutated domains are protected?

sourcetype=phishing | eval ddos=if(searchmatch("hostname=*prolexic*"),"True","False") | search ddos="True" OR is_match="True" | stats count

We see that there are now 808 domains protected, instead of the original 381, big change!

Now we'll see what permutation types are protected against the most:

Permutation Type Count Replacement 243 Insertion 211 Bitsquatting 139 Repetition 48 Transposition 45 Omission 44 Homoglyph 40 Hyphenation 19 Subdomain 19

Lastly, what domains are protected the most:

Domain Count amazon.com 91 amazon.co.uk 66 booking.com 59 yahoo.com 58 amazon.in 56 pinterest.com 41 netflix.com 26 google.es 21 paypal.com 19 imdb.com 15

Final Thoughts

So, some interesting things came out of this research. First, the most common types of permutated domains that companies seem to register are replacement or insertion permutation techniques (netflox.com or netfliix.com). We also discovered that a majority of companies are using DDoS protection sites to register permutated domains (this isn't really a surprise, just interesting to note.). Lastly, we see that amazon.com, booking.com and yahoo.com are the most protected against potential phishing attempts.

The last note isn't that surprising again, but, it's interesting to see what companies took the time and steps to register these other domains. Amazon and Yahoo! are definitely sites I would expect to see, more so Amazon than Yahoo, however, since Yahoo has been around for a while, it makes sense.

If you're interested in checking out the data we used, you can find it here. If you have any questions or comments about this info, please feel free to reach out to us on Twitter, @brian_warehime or @__eth0.