Online accounts enable us to store and access documents, make purchases, and connect to new friends, among many other capabilities. Even though online accounts are convenient to use, they also expose users to risks such as inadvertent disclosure of private information and fraud. In recent times, data breaches and subsequent exposure of users to attacks have become commonplace. For instance, over the last four years, account credentials of millions of users from Dropbox, Yahoo, and LinkedIn have been stolen in massive attacks conducted by cybercriminals.

After online accounts are compromised by cybercriminals, what happens to the accounts? In our paper, presented today at the 2016 ACM Internet Measurement Conference, we answer this question. To do so, we needed to monitor the compromised accounts. This is hard to do, since only large online service providers have access to data from such compromised accounts, for instance Google or Yahoo. As a result, there is sparse research literature on the use of compromised online accounts. To address this problem, we developed an infrastructure to monitor the activity of attackers on Gmail accounts. We did this to enable researchers to understand what happens to compromised webmail accounts in the wild, despite the lack of access to proprietary data on compromised accounts.

Cybercriminals usually sell the stolen credentials on the underground black market or use them privately, depending on the value of the compromised accounts. Such accounts can be used to send spam messages to other online online accounts, or to retrieve sensitive personal or corporate information from the accounts, among a myriad of malicious uses. In the case of compromised webmail accounts, it is not uncommon to find password reset links, financial information, and authentication credentials of other online accounts inside such webmail accounts. This makes webmail accounts particularly attractive to cybercriminals, since they often contain a lot of sensitive information that could potentially be used to compromise other accounts. For this reason, we focus on webmail accounts.

Our infrastructure works as follows. We embed scripts based on Google Apps Script in Gmail accounts, so that the accounts send notifications of activity to us. Such activity includes the opening of email messages, creation of email drafts, sending of email messages, and “starring” of email messages. We also record details of accesses including IP addresses, browser information, and access times of visitors to the accounts. Since we designed the Gmail accounts to lure cybercriminals to interact with them (in the sense of a honeypot system), we refer to the accounts as honey accounts.

To study webmail accounts stolen via malware, we also developed a malware sandbox infrastructure that executes information-stealing malware samples inside virtual machines (VMs). We supply honey credentials to the VMs, which drive web browsers and login to the honey accounts automatically. The login action triggers the malware in the VMs to steal and exfiltrate the honey credentials to Command-and-Control servers under the control of botmasters.

To test our infrastructure, we set up an experiment using 100 Gmail honey accounts. To make the honey accounts appear to be real, we populated them with emails from the Enron email dataset, being the only large publicly available email dataset. Prior to populating the accounts, we made some changes to the Enron email dataset. For instance, we changed all Enron employee names to fictitious names and changed all old dates to recent dates. We also changed all instances of the company name “Enron” to a fictitious company name that we made up. As a result, the honey accounts appeared to be email accounts belonging to company employees. After populating the honey accounts, we instrumented them using the Google App Script method described earlier.

To prevent the honey accounts from sending spam messages while under the control of cybercriminals, we configured them to send all outgoing email messages to a mail server under our control. We also configured the mail server to simply dump the emails to disk, without forwarding them to the intended destination (as a normal email relay server would do). In addition, we imposed bandwidth and traffic restrictions on the malware sandbox infrastructure (described earlier) following best practices, to ensure that the infected VMs could not be used for harmful purposes, for instance, denial-of-service attacks. Finally, we obtained ethics approval for the experiments from our institution (UCL).

In a manner similar to the behaviour of cybercriminals, we leaked credentials to the honey accounts through paste sites, underground forums, and malware, to lure cybercriminals into interacting with the honey accounts. On the paste sites and underground forums, we posted textual dumps of the honey credentials. In the case of leaks via malware, we simulated a scenario in which a human user logs into a webmail account on a computer infected with information-stealing malware. To that end, we used the malware sandbox infrastructure described earlier to simulate infection and login activity.

We wanted to observe if providing location information in leaks would affect the way that cybercriminals connect to the honey accounts. Hence, in some leaks, we claimed that some of the honey accounts belonged to people living in the UK, in locations close to London. We also claimed that some other honey accounts belonged to people living in the US, in locations close to Pontiac, MI. In the remaining leaks, we simply listed usernames and passwords of the honey accounts, without specifying any location information.

We monitored and collected data from the honey accounts for seven months. Analysis on the data revealed some interesting behavioural patterns. We found that providing location information along with usernames and passwords indeed affects the way that cybercriminals connect to the honey accounts. For instance, when we claim (in the text of the leaked credentials) that the account owners live in locations close to London, we observe that accesses to the honey accounts originate from locations closer to London than if we do not specify the locations of the owners of the accounts. This observation is even more pronounced in the US case. This is because cybercriminals connect to the accounts via proxies (to make their locations appear closer to the victim’s location) in a bid to evade login anomaly detection systems usually employed by webmail service providers to protect webmail accounts. When the cybercriminals do not know the locations of the victims (that is, when we do not provide decoy location details), connections appear from farther away.

We observed four types of accesses to the honey accounts during our experiments. We refer to them as Curious, Gold Digger, Spammer, and Hijacker accesses. Curious accesses are made simply to see if the honey credentials are real, Gold Digger accesses are made to search for sensitive information in the accounts, for instance, financial information that could possibly be monetised. Spammer accesses are made to send spam messages, while Hijacker accesses are made to change the passwords of the accounts to lock the original owner out. It is important to note that these access types are not exclusive, for instance, a Gold Digger can simultaneously be a Spammer if the visitor to the honey account opens and reads some emails, and then sends some spam messages.

We observed some distinctions in accesses across leaks. For instance, accesses to the accounts that we leaked through malware were only of the Curious and Gold Digger types, while we observed Spammer and Hijacker accesses to the accounts leaked via paste sites and underground forums, showing that the cybercriminals that obtained the credentials we leaked through malware were stealthier than the rest.

We also recorded details of the browsers that were used to connect to the honey accounts across the outlets. Again, we found further evidence of stealth in the activity of cybercriminals commandeering malware outlets, that is, they took steps to hide information about their browsers (usually by presenting empty user agent strings or connecting to the honey accounts via the Tor network, in a bid to obfuscate information about their accesses).

Also, in the accounts stolen through paste sites, we observed that cybercriminals tend to connect from locations close to the decoy locations whenever we provide such decoy location information. As for the accounts that we leaked through forums, the cybercriminals that connected to them did not make significant attempts to manipulate or conceal their accesses.

We used our analysis to evaluate the differing levels of sophistication of cybercriminals across the different outlets. Sophistication in this context refers to their level of stealth, that is, the stealthier ones are more sophisticated than the rest. Cybercriminals that steal webmail account credentials via malware are the most sophisticated, since they actively take steps to cover their tracks after compromising the honey accounts, followed by the ones that obtain or trade credentials via paste sites. The least sophisticated ones are the cybercriminals that obtain account credentials through underground forums.

In conclusion, cybercriminals actively seek ways to connect to stolen webmail accounts without triggering login security systems. If they are able to obtain additional information about the victims, they can mount more effective attacks against such victims. For example, it is possible to obtain additional information about victims through online social networks. With the help of such information, they can then connect to the compromised accounts via proxies in a bid to appear close to the locations from which the victims usually connect to their accounts (to fool login anomaly detection systems).

In our work, we have identified some behaviours of cybercriminals that could be incorporated into systems designed to detect malicious activity in webmail accounts. Finally, we presented the first publicly available infrastructure to help researchers understand what happens to compromised webmail accounts. Further details can be found in the full paper – What Happens After You Are Pwnd: Understanding the Use of Leaked Webmail Credentials in the Wild, by Jeremiah Onaolapo, Enrico Mariconti, and Gianluca Stringhini and our honeypot infrastructure code is also available.