How Webhose Uses Image Analysis and Recognition to Identify Illicit Content on the Dark Web

Collecting data from the Dark Web is immensely more complex than it is in the open web. Since users want to remain anonymous and criminals often use their own jargon, it is far more difficult to navigate and track the plans of cyber criminals. For instance, finding stolen credit card information on the darknets involve a search for the term “fullz.” In addition, content migrates among the different networks and messaging platforms to prevent detection, making it harder to track.

Now Webhose allows for easier navigation of the Dark Web with its image recognition capabilities. Instead of relying solely on data provided by text, customers can also add to their arsenal the ability to extract data from images using AI and machine learning capabilities. Alternatively, image recognition can help clarify any data missing from texts on the Dark Web, as in identifying new types of drugs or the latest trending illicit products connected to the image. In some instances, it can even identify the location of a criminal.

Extracting More Dark Web Data with Image Recognition

Having image recognition capability means that Webhose’s customers can now search the darknets extensively for images of narcotics and drugs, data breaches (like a passport or credit card data, etc), sensitive documents, or other nefarious plans of cyber criminals, saving time and resources that could be better spent on analysis.

With its image recognition capability, Webhose can now identify thousands of objects, including pets, furniture, people, events, food, plants and flowers, electronics, transportation and vehicles and more. Working together with Amazon Rekognition, it also detects scenes within an image, such as a sunset or a beach in addition to explicit and suggestive content so you can filter images according to specific requirements.

Webhose recognizes these images through a 3-step approach:

Validation – the service does its best to skip recognition of logos and icons

Tagging – images are hashed into IDs and Webhose is able to identify similar images to this one from now one through different sources

Analysis – this includes metadata extraction, object identification and Optical Character Recognition (OCR)

Let’s look at several different examples of how it works:

Identifying Images of Weapons

The Dark Web is infamous for the trading of illegal weapons around the world.

According to a study conducted by RAND Corporation, most firearm vendors on the darknet (59%) originated from the United States, but the combined revenue of European sales are 5 times higher than those of the US.

In this example, we wanted to search for an image of a weapon in the darknets. After entering a query for an image label “guns” that included a reference to bitcoin, we received the following results in JSON format which include text extractions of the title, file type, domain and identification of the language of the domain name of the image.

Query: image_label:gun AND BTC

Continuing along the JSON we find the image label which was identified by Webhose as a weapon with different text descriptions:

Here is the image that the labels were able to identify:

Identifying Images of Counterfeit Drugs

One of the most famous examples of a marketplace on the dark web that specialized in the sale of illegal drugs was the Silk Road, which was shutdown by the FBI in 2013. This is just one example of the type of data that can be gathered from such sites.

By adding image recognition to Webhose’s ability to extract and enrich text on the Dark Web, more of this data can be processed and provided to law enforcement and national security officials and analysts.

In this example, we wanted to search for an image related to the sale of counterfeit drugs in the dark nets. We entered a query that included a reference to “pill” and “DrugMoney PotHeads” and received the following results in JSON format which include text extractions of the site name, domain, identification of the language of the domain name as English, and associated text with the image.

Query: image_label:medic* AND image_label:pill AND “DrugMoney PotHeads”

Continuing along the JSON we find the image label identified the image as a counterfeit drug with different text descriptions:

Here is the image described by the labels:

Identifying the Location of the Image

A final example demonstrates how image recognition can be used to locate a criminal. If law enforcement can act fast enough, they may be able to stop them in their tracks.

Here is the JSON with the metadata that tracked the location:

Here is the image the drug dealer published in Valhalla market, discussing a new type of drug:

Through image recognition and the extraction of meta-data, Webhose was able to identify the GPS location of the image as originating from a particular address:

Location: (52:4643759,13.3237932)

Markelstraße 13, 12163 Berlin, Germany

The Added Value of Image Recognition

The Dark Web is a known source for criminals who wish to make transactions of weapons, drugs, and a host of other illegal activities. As a leading provider of on-demand access to structured web data, Webhose extracts and enriches text in the many different marketplaces and forums of the darknet and provides it to cybersecurity companies and law enforcement analysts. Unfortunately, sometimes textual word queries don’t deliver any results. Now, Webhose delivers analysts and security organizations the ability to add labels to images so that they can also search by images in addition to text, enabling Webhose and its customers to benefit from the full potential of images on the Dark Web. Sometimes an image really is worth a thousand words.