Introduction

During the last months I was a member of the board for a project titled Artemis, a research project that I started with my co-author Richard Amores to better understand the Deep Web and profile the actors that populate it.



We have worked mainly in two directions: the first one related to a massive analysis of a meaningful number of Tor URLs and hidden services and their evolution at the time, and in a second breach we explored the possibility to track users within Tor networks. Unfortunately we haven’t found a sponsor to support our researches so we decided to suspend the study mainly for financial reasons.



Despite there is still a lot of work to do, a lot of work has been already done and I desire to share with you some interesting findings.



Regarding the possibility to detect Tor users, I suggest reading an interesting study quite similar to what we have tried to do this month, it is titled “IEEE -Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization – report” and was recently presented by Alex Biryukov, Ivan Pustogarov and Ralf-Philipp Weinmann researchers at the University of Luxembourg.



I don’t desire to go deep on technical findings of the study but they verified, as my co-author and I have, that it is possible to track people in Tor network, manipulating Onion Routers (OR) to trick our target to use our OR as an Introduction/rendezvous point.



In this post I desire to give some interesting data on the part related to the use and the content of the Tor network. Following the steps we have followed during phase one of our research.



URLs Crawling

We have created an architecture to keep us hidden during our investigation and we have developed our own crawlers and search engine to analyze in the time various subsets of Tor Sites. For each site we collected a huge quantity of information related to its content, and also to the connection parameters during the analysis.



Principal attributed collected for each URL are:



URL



HTML content



Header info



Title



Description



Author



Generator



Keywords



HTTP Response Header



The crawler also analyzed for each site visited the sub URL contained FTP Links and email. Artemis was also able to detect i2P addresses and Transfer information related to the specific connection.



The acquisition of all this information made possible the creation of the first embryo of dashboard that can be used for various types of analysis. Following the principal functions implemented for the dashboard:



Crawled URLs – List of Crawled URL



Crawled URLs Meta Tags – Collection of principal Meta tags belonging the List of crawled URLs



Sub URLs – Collection of Sub URL retrieved within each crawled URL



Email crawled from URLs – Collection of Email addresses contained in each crawled URL



Onion URL Errors – List of connection error retrieved during crawling activities



Crawl HTML – HTML related crawler to visit the URL



Transfer Info – Transfer Info related to each crawler session



Key Words in Tor Web site – Collection of principal Keywords belonging the list of crawling URLs



Search for FTP links – Collection of FTP links for each crawled URL



Search for i2p – Collection of i2p links for each crawled URL



Emails finder – This function allows to search all web sites, and email contained, that include a “search word” provided by the user.



Figure 1 – Artemis Project Dashboard



The analysis focused on sites in native English. In further analysis we planned to explore also other languages such as Chinese, Arabic and Russian.



Statistics

We started from a sample of around 25,000 Tor addresses randomly generated by our crawlers; that started their inspections. Around 10% of these sites were not reachable during the crawling pass. The information related to these addresses were verified three times during Q1 on a monthly basis.



Of the overall URLs analyzed in January: 10 percent resulted inactive, the percentage passed with 13 percent in February and was nearly 8 percent in March. In February we observed the major number of Tor Addresses that became inactive, 680 websites were unreachable by the crawler during the analysis. Meanwhile, we noted that around130 URLs that were inactive, became active once again. The majority of these web sites contain general purpose content, they are probably managed by private users that are exploring the Deep Web and the way they can publish their content on it, none of the news sites that became active were related to e-commerce.



The e-commerce sites and hacking forums are considered stable, none of the sites inspected had disappeared during the monitoring. Web sites that resulted more unstable were related to political issues, in many cases single bloggers have published their page on the Tor network, but they haven’t continued their activities.



Figure 2 – Q1 Tor Address crawling activities



As exercise we tried to assign the crawled URLs into the following categories:



Figure 3 – Tor URL categories



Figure 4 – Tor URL categories



The majority of URLs analyzed belongs to E-commerce web sites and cyber-criminal activities. In the Deep Web it is relatively easy to find sites that sell every kind of product. The principal items sold in the underground are drugs and weapons, a multitude of websites proposes any kind of drugs accepting various currencies for payments, from bitcoins to PayPal, considering the Liberty Reserve is unusable due the FBI shut down.



Following some random websites extracted from our archive:



URL Title http://tw2d3sglwcxryv67.onion/ High shop http://5lywtauniqa5at6c.onion/help.html WELCOME TO POTCO http://yewmggktmoycxzsg.onion BuyRC (Research purpose)

The natural habitat for hacking forums and Cyber-criminal activities is in the Tor network. Thanks to the anonymity offered by the network an increasing number of criminal gangs is starting to offer malicious code hacking services and compromised accounts to access popular web services.



Many websites offer information related to compromised PayPal accounts or credit card numbers.



Top 4 activities proposed on the hacking website are:



Malicious code sales



Hacking services



DDoS services



Exploit writing services and sales



Another thing that we noted is that many pages published on Tor network were very dated, various servers were not properly configured to allow the analysis of principal configuration files.



Why keep alive a hidden service without using it? The principal reason in my opinion is that many of these services are installed in research environments such as universities, associations or schools; and the fact that they aren’t updated demonstrates that creators for some reason haven’t had access to the resource or simply decided to not shut down the service. As we said the Tor URLs analyzed are using Latin Script, the most popular language is English, that scores virtually the totality of web sites, a very small portion of sites are written in Spanish and German.



The results of our analysis are quite different then from a recent study published by ahmia.fi‘s database that produced following findings:



4857 .onion hidden services have been found.



1028 are online just now.



38 servers have shared possible child porn, inadvertently or on purpose.



There are at least two known marketplaces really selling drugs.



Principal differences are related to the number of online URL we have found, the website we analyzed appears more stable, e-commerce websites we found propose mainly drugs and weapons, and none of these appear legal. Our research also detected a major number of websites that propose PEDO content, in line with the average highlighted by ahmia.fi.



Despite the quote related to cybercrime with URLs is remarkable, the conclusion is that the Tor network contains also mostly legal content, in particular the volume of documents related to political issues is in continuous increase.



It must also be considered that the number of hidden services has increased during our investigation. This is the proof that an increasing number of users are moving some of its activities in the Deep Web, probably for security reasons or to preserve their anonymity. I think for example a shared document repository or the deployment of control applications for critical systems within the Tor network.



Another interesting piece of data we analyzed, is the occurrence of certain words of interest that we have chosen, the calculation has been repeated each month. All the words examined showed a constant value at the time, except for the word “Bitcoin”. Already in the first quarter of 2013 the number of websites containing the words “Bitcoin” had increased in a meaningful way, probably due to the developing popularity for virtual currency. We repeated the measure again this month, after the shutdown of Liberty Reserve, and we observed an increase of 20 percent caused by the accumulating number of Tor websites that accepted bitcoins as method of payment.



The following table shows the results of our last counting:



Word N° Occurrences mail 4865 blog 3180 wiki 2707 bitcoin 2633 anonymous 2462 sex 1550 username 1541 market 1524 gun 1258 software 1090 pedo 1082 hacking 1063 i2p 986 drugs 900 child 895 IRC 854 Surveillance 610 weapon 581 politic 566 books 413 fraud 386 exploit 310 PayPal 302 Censorship 276 anarchism 254 CP 223 porno 218 baby 218 piracy 199 credit card 170 profiles 149 banking 120 Liberty Reserve 118 DDoS 115 malware 89 phishing 82 Mule 72 credential 67 Jihad 36 keylogger 12 Prism 7 spear phishing 5 Muhammad 2 Activation Key 0 al-Qaida 0

Notes on researches carried out

The crawling of Tor URLs is a precious source for a series of evaluations. First of all, the composition of the sample analyzed as we have already discussed and the level of “stability” of each Tor site, has the possibility to utilize a tool to discover FTP links, I2P and email addresses. This type of information could be used as a starting point for further investigation and I’ll try to explain how to use it.

The i2p links

According to the official definition, “I2P is a scalable, self-organizing, resilient packet switched anonymous network layer, upon which any number of different anonymity or security conscious applications can operate“.



In few cases organizations and criminal gangs could differentiate the channels used for communication purposes, and can also anonymize the execution of specific applications that offer various types of services if they desire.



Inside an I2P network the “hidden” component is in fact represented by doing an application execution on the node, and of course the path followed by the information to reach the destination, an application could provide example information extracted from a DB for the users.



Figure 5 – Artemis Project – Search for i2p



FTP links and Email Finder

Another interesting aspect of the investigation is trying to figure out the links between the World Wide Web and Deep Web, it’s quite easy if the Tor user desires to share content in an anonymous way. The user tries posting anonymously the FTP address in its own repository. This practice is very common for cybercriminals, and also for legal content sharing, i.e. a group of activists in this way spread links to a collection of documents to plead their cause.



Figure 6 – Artemis project – FTP links search



Another element of interest is the analysis of email addresses present in various TOR web sites. An investigator could for example, search an email address present in a website specialized in the sale of drugs or weapons and find it on the ordinary web.



Most of the email addresses are related to anonymous mail services such as Tormail.org, or a Tor hidden service that allows to send and receive email anonymously, to addresses inside and outside Tor. But as we say in Italy “he who seeks finds!” and in various cases searching those email addresses makes it possible to localize the owner of the accounts with a simple search on Google … a very easy way to find Tor users, what do you think?



To facilitate the consultation of email archives, we implemented a function when receiving an input; a word retrieves all the email addresses contained in the Tor websites. Let’s imagine to be interested to catch pedos, we search the word ‘pedo’ in the system. The search will provide a list of websites containing the word “pedo” and the email addresses as well. Also, in this case it could prove useful to search those websites with a common Google search and with a bit of luck, I might unmask a criminal.



Figure 7 – Email Finder – searching for word “gun”



IRC

Internet Relay Chat (IRC) is another common channel used for instant communication (chat) on the Internet, it allows both direct communication between two individuals and the communication within groups in a room discussion called channels. IRC is often used as part of a customer care service for cyber criminals that usually use them to discuss details on a new offer or products.



The monitoring of the IRC channel is a good starting point to analyze a cybercrime offer or to follow hacking groups, we discovered that these two categories are the ones that most of all make use of TOR websites to propose their channels avoiding public publication it on the WWW.



The record influence of the Deep Web

Interesting is the evaluation of the impact the events of everyday life has on the Tor network. Let’s think for example, about the shutdown of the Liberty Reserve private currency exchange. It has caused a sensible increase of the number of cybercriminal sites and hacking portals that started to also, accept other methods of payment. This information is easy to track with instruments we created, but similar analysis could be conducted for the evaluation of the sentiment of the country in a specific period.



Let’s start with the observation of evolution of the number of directly connecting users for both US and UK in the last year. Despite the fact that both countries don’t apply any kind of control to the internet, brings a growing number of individual fears concerning surveillance and access to the Tor network as shown in the following graphs:



Figure 8 – Tor Metrics – Number of directly connecting users US



Figure 9 – Tor Metrics – Number of directly connecting users UK



Both graphs shows a concerning phenomenon, the number of directly connecting users from the UK has increased nearly 50% in the last year. Meanwhile, for the US the figure is a little lower around 41%.



Fear of being tracked by the authorities, with regard to “specific contexts” will remain anonymous, explosion of phenomena such as whistleblowing are the main marked increase.



The situation is totally different for the Syrian Arab Republic, we observe in fact an intense use of the Tor Network made by Government opponents to avoid censorship applied by the regime, as reported in the following graph. The Government of Damascus has applied a cruel control to the internet tracking group of activists and is torturing them.



Figure 10 – Tor Metrics – Number of directly connecting users Syria



During our analysis we also noted a couple of “anomalous” patterns, the first one is related to Turkey which has been recently investigated in a series of popular disorders that protests against the Prime Minister Erdoğan. Well in the last year, the number of users of the Tor network increased 32% to a peak in June 2013 during the manifestations of protests.



But the strangest data is related to Pakistan, where the use of the popular anonymizing network is literally exploding as shown in the following chart:



Figure 11 – Tor Metrics – Number of directly connecting users Turkey



Figure 12 – Turkey popular disorders



Figure 13 – Tor Metrics – Number of directly connecting users Pakistan



Conclusions

Deep Web, and in particular Tor networks, is an immense source of data that opportunely managed could support investigation activities and intelligence analysis. This information could be used to profile cyber criminals, prevent crimes and steady evolution of the cybercriminal affair.



This post has the sole claim to provide a useful indication of the potential for a source of analysis accessible to everyone. Our investigative activities went very well and will be an integral part of a new book on the Deep Web that describes many aspects hitherto not covered in any text.



References

http://torrorists.wordpress.com/2013/06/19/tor-network-provides-almost-only-legal-content/

https://metrics.torproject.org

http://resources.infosecinstitute.com/anonymizing-networks-tor-vs-i2p/

https://ahmia.fi/search

http://www.amazon.com/The-Deep-Dark-Web-hidden/dp/1480177598