A new plagiarism detection service says that it can help track down copied text on the Internet—but it is any better than a search engine? We put it to the test with some of our own content.

Plagium is simple to use: enter text into a box and hit the "track plagiarisms" button. Site operator Septet Systems says that Plagium uses "Septet’s proprietary TX Miner engine, which employs advanced search technology for deep mining of documents on the public World Wide Web or within private repositories," but the actual search results are generated with the Yahoo Search API.

This raises the obvious question: why not just use a search engine in the first place? For one thing, Plagium can handle longer strings of text than most search engines (Google tops out at 32 words per query). Plagium can also generate alerts for a given text string, notifying users by e-mail whenever a new instance of the text appears on the Web. Finally, it generates a cool graph.

The graph above is a "text usage timeline" showing how often a particular bit of text has appeared on the Web; it shows up just above a more traditional list of search results.

We ran two Ars articles through Plagium to see how they fared. The first was our "Dealing with PlayStation 2 disc read errors," an old piece from 2003 that (amazingly) continues to generate pageviews and e-mail.

Sticking one paragraph from it into Plagium brought up 13 documents. Most were direct rip jobs of the entire article, usually by small-time bloggers. It goes without saying that such rips featured neither links or attribution, and the really classy ones just inline the images from the post right off our server.

But Plagium cautions that it can't actually "detect" plagiarism—it merely points out possible problems. Case in point: two years ago, someone posted a question on Yahoo Answers. It read, "Why dose my old ps2 say disc read error and what can i do to correct it?" One of the responses linked the Ars article but also copied eight full paragraphs of text. That's a substantial borrowing, but it still might qualify as fair use were it ever to come up in court.

A Google search turned up essentially the same list of sites, despite truncating the input at 32 words.

Testing a second (and more recent) piece on Spore's DRM brought up 10 hits, including the original article. Google, though, only generated three hits, including the original article.

Given Plagium's cost (free), decent results, and alert system, it's certainly worth a try for specialized plagiarism detection. For the occasional one-off search, however, a traditional search engine will also do a fine job.

The site competes against Copyscape, which offers similar services (but charges for them). Plagiarism Today ran some tests of its own on both tools and concluded that "in all five tests, Google outperformed both Plagium and Copyscape. However, it contained a very high amount of duplicate results and the benefit was likely minimal."

Other plagiarism detection tools like TurnItIn focus on academic plagiarism rather than website copying, and serve a different market.

My own experience with grading collegiate papers quickly taught me that the students likely to do even the minimal work of calling up an academic paper through a database or finding it in the library were unlikely to then plagiarize its content in any brazen way (improper citation practices were another story). Those who did plagiarize tended to do so in an unsophisticated way, ripping off content from the Web that could easily be found with a simple search.

All that's to say that sites like Plagium, though they target the Web, will probably catch plenty of less-sophisticated academic plagiarists as well.

Listing image by Randall Schwanke