New research from Boston's Northeastern University shows that with the shutdown of Megaupload, the U.S. Government took down at least 10.75 million legitimate files. The researchers examined the percentage of legal and copyright-infringing content on six file-hosting services. While infringing content outnumbers legitimate files on all sites, the volume of non-infringing content casts doubt on the drastic seizure that took place early last year.

When the U.S. Government took down Megaupload it branded the company as a pirate site with seemingly few legitimate uses. Up until now, however, there has been no research to back up this claim.

In an attempt to fill this gap researchers from Boston’s Northeastern University, together with colleagues from France and Australia, examined millions of files that were uploaded to five cyberlockers (FileFactory, Easy-share, Filesonic, Wupload and Megaupload) and the reupload service Undeadlink.

To find out whether the files were legitimate or not, the researchers extracted metadata from the site’s uploads using a link checker. The researchers controlled for several factors including split archives, and then manually determined the legitimacy of the files based on random samples of 1,000 uploads per site.

The results just published in the article titled “Holiday Pictures or Blockbuster Movies? Insights into Copyright Infringement in User Uploads to One-Click File Hosters” provide a unique insight into the proportion of infringing content on these services.

The main results displayed in the figure below reveal that the percentage of infringing files varies heavily between the six services. In addition, it also shows that for the majority of the files the researchers couldn’t conclusively determine whether a file was infringing or not.

For Megaupload (MU) the researchers found that 31% of all uploads were infringing, while 4.3% of uploads were clearly legitimate. This means that with an estimated 250 million uploads, 10.75 million uploads were non-infringing. For the remaining 65% the copyrighted status was either unknown, or the raters couldn’t reach consensus.

Using the most conservative estimate the findings show that the Megaupload raid took down at least 10.75 million legitimate files. In addition, the researchers found that FileFactory had a highest percentage of non-infringing uploads (14%).

With 0.1% Wupload and Undeadlink had the fewest uploads that were clearly legitimate, while 79% of all files added to these services were without a doubt infringing.

The research confirms that “one click” file-hosting services appear to be predominantly used to upload pirated content. However, there’s clearly also plenty of non-infringing uses, something the U.S. Government may have overlooked when it took Megaupload offline.

TorrentFreak spoke with Tobias Lauinger, one of the authors of the paper, who told us that the high volume of legitimate files is one of the most interesting aspects of the study.

“What I find most interesting about our results is that they support what many people were already suspecting before: That Megaupload was partially being used for “illegal” file sharing, but that there were also millions of perfectly legitimate files stored on Megaupload.”

One of the main drawbacks of the findings is that the researchers couldn’t determine the infringing status of the majority of the files. For two-third of all uploads to Megaupload this remains uncertain.

While unlikely, this means that in the most optimistic scenario 69.3% of the files uploaded to Megaupload could be perfectly legal. This means that the Megaupload raid could in theory have destroyed 172,500,000 million non-infringing files.

TorrentFreak talked to Megaupload’s Kim Dotcom who says that both the number of files as well as the non-infringing use was much higher in reality.

The researchers, however, found that based on the number of possible file IDs and the hit rate they got by randomly guessing these IDs there were an estimated 250 million files available at the time of the experiment.

Of course, the many users who lost access to their personal files are not helped by this statistics. But perhaps it may serve as a reminder for the District Court to finally make a decision on whether or not to allow former users to retrieve their files. It’s been almost two years after all.