Password and credit-card details leak online every day. So no one really knows just how much personally identifiable information is available by clicking on the right link to Pastebin, Pastie, or similar sites. Using a platform that runs on the hobbyist Raspberry Pi platform to drink from this fire hose, a security researcher has cataloged more than 3,000 such posts in less than three months while adding scores more each week.

Dumpmon, as the project is called, is a bot that monitors Twitter messages for Web links containing account credentials, sensitive account information, and other "interesting" content. Since its debut on April 3, it has captured more than 3,300 records containing 1.1 million addresses, most of which are accompanied by the plaintext or cryptographic hash of an associated password. The project has also unearthed social security and driver license numbers, credit card data, and other information that could be used to hijack user accounts or commit identity theft. On average, Dumpmon collects 51 such posts each day.

"It was mainly trying to determine how much information is being hidden from plain view and finding out how much information can be found just by looking in the right place," said Jordan Wright, a security engineer for CoNetrix. (Wright created the Dumpmon as an independent side project.) "It's pretty incredible. I wasn't expecting as much information as I found. I was expecting a lot less for sure."

The "dumps," as the online data postings are called, are frequently published to embarrass the victims or as a means for hacking crews to demonstrate their prowess to rivals. Often, dumps are advertised on Twitter or another social networking site with a line or two of vague or cryptic text and a link. In the span it takes to comb through one such posting, a half-dozen or more additional dumps may be posted. The frequency makes it hard for outsiders to keep tabs.

Of the 620 records Wright has analyzed in depth, the researcher recovered 174,423 e-mail addresses accompanied by a hashed or plaintext password. With so many sites using e-mail addresses as account user IDs, the data often gives attackers all they need to access multiple accounts maintained by a victim. In the event the owner has used the same address and password to secure other accounts—or even the e-mail address itself—attackers can reuse the credentials to hijack those as well. Of the 174,423 e-mail addresses Wright analyzed in depth, more than 120,000 of them were accompanied by a plain-text password. The remaining passwords were expressed as cryptographic hashes, which are frequently trivial to crack.

Account credentials are by no means the only valuable data included in these postings. The 620 records Wright analyzed for this article also contained what appeared to be valid data for 1,496 payment cards. In many cases, data collected by Dumpmon included bank account numbers and home addresses. Other files observed by Ars included social security and driver license numbers, first and last names, addresses, and medical diagnoses contained on health records. Dumps also contained passwords stored on computers that had been infected by malware.

"These full identity dumps are probably more of the higher commodity item," Wright said of the records containing social security numbers, names, and addresses. "As far as why these were dumped for free, that's the answer I'm looking for: Why people are giving this information out?"

Some of the data—for instance, a recent dump posted to Pastebin that Ars will not link to—appears to be derived from browsers that were configured to store frequently used account IDs and passwords. When the computers are infected with malware, the credentials are dumped to a file that later gets posted online. The discoveries led Wright to publish a post documenting how Google Chrome, Internet Explorer, and other browsers store passwords. Incidentally, Wright concluded users shouldn't trust their passwords to these storage systems, but I'm not so sure. Any computer that is infected with malware that provides a backdoor onto the system is already vulnerable to wholesale password theft. In fairness to Wright, the sensitive details may be easier or quicker to gather en masse when they're stored in a browser.

Other dumps cataloged by Dumpmon included private SSH encryption keys used to administer websites, configuration files for Cisco routers, and logs from successful malware infections.

To keep things interesting, Dumpmon has been designed to run on the Raspberry Pi platform.

"The goal was to find a happy balance between both obtaining new pastes from the different sites, as well as processing the existing pastes in the queue to determine if they are interesting," Wright said. "This created challenges, since the Raspberry Pi has limited hardware capability and I was monitoring for quite a few things."

Because posts on Pastebin and other sites are often taken down by the original poster or site administrators, Dumpmon also copies and stores the contents of each one. While Wright has published the underlying code for anyone to use, he said he makes the cached data available only to white hat researchers.

"I don't want to make it easier for the wrong people," he explained. "My goal was as best as I could only give it to people who will use it responsibly."