Discovering the Secret Internet of Chinese People

I recently noticed that numbers are used a lot in China for email addresses and user names. I also found out that a number of popular websites, such as Alibaba and Baidu, had official domain names that are entirely numbers. It seemed that people had a preference for numbers instead of latin letters, and even big websites wanted to accommodate for this.

My girlfriend later confirmed that there are indeed lots of websites using just numbers as their domains. She told me this is sometimes used as a way to hide websites, mostly gambling and porn, and in some cases even sell access to them by just getting money and giving the secret domain name in exchange.

Wait a second! This sounds like security through obscurity, hiding things in plain sight. It’s a very creative way to restrict and sell access to websites, and it clearly works well enough for their purpose. But that’s not enough to stop us from finding them with a simple script. Numbers are very easy to generate and the fact that we’re looking for domains with all numbers increases our chances of coming across one.

Scanning random domains

My strategy in trying to find these websites is checking random domains until we find one. And the first step in anything involving randomness is to import random . Now we can start our script by writing a generator for random domains.

def domains (): while True : yield "{}.com" . format ( random . randint ( 1000 , 1000000 ))

This will give us an endless stream of random domains. After this, we will want to check if these domains actually have a DNS record, which is basically checking if that domain exists. To do that; we can use the socket library, mainly the socket.gethostbyname function.

def ips ( domain ): try : yield socket . gethostbyname ( domain ) except socket . error : pass

All this code does is try to get the IP address for the domain and return it if we succeed. If the way we’re writing these functions look weird, don’t worry. They actually fit together quite nicely.

These two functions should be enough to do random scans to see if anything turns up. We can use them together like this.

for domain in domains (): for ip in ips ( domain ): print ( domain , ip )

This will start scanning random domains and probably print lots of domains and their IP addresses. Here’s the full code of the scanner.

import socket import random def domains (): while True : yield "{}.com" . format ( random . randint ( 1000 , 1000000 )) def ips ( domain ): try : yield socket . gethostbyname ( domain ) except socket . error : pass for domain in domains (): for ip in ips ( domain ): print ( domain , ip )

Future work and improvements

We’re getting domains but you will notice some of them are just Domains for sale! pages. In order to help us find interesting domains faster, we can write another function to grab the title of these websites.

def titles ( domain ): try : html = requests . get ( "http://{}" . format ( domain ), timeout = 3 ). text title = re . search ( "<title>(.*?)</title>" , html ) if title : yield title . group ( 1 ) except : pass

We can combine this with the two other functions in order to print valid domains with their title, and print just the domains if there isn’t any title.

for domain in domains (): for ip in ips ( domain ): for title in titles ( domain ): print ( domain , ip , title ) break else : print ( domain , ip )

There are some easy ways this script can be improved in the future. Adding multithreading or an asynchronous DNS implementation might increase the performance. Also highlighting certain keywords and characters in the title should help us find interesting websites more efficiently.