For several years, it seemed as though the book industry was getting a reprieve. As the music industry was ravaged by file sharing, and the film and TV industry were increasingly targeted by downloaders, book piracy was but a quaint cul de sac in the vast file sharing ecology. The tide, however, may be changing. Ereaders have become mainstream, making reading ebooks palatable to many more readers. Meanwhile, technology for scanning physical books and breaking the DRM on ebooks has continued to advance.

A recent study by Attributor, a firm that specializes in monitoring content online, came to some spectacular conclusions, including the headline claim that book piracy costs the industry nearly $3 billion, or over 10% of total revenue. Of all the conclusions in the Attributor study, this one seemed the most outlandish, and the study itself might be met with some skepticism since Attributor is in the business of charging companies to protect their content from the threat of piracy.

Nonetheless, the study, which monitored 913 titles on several popular file hosting sites, did point to a level of activity that suggested illegal downloading of books was becoming more than just a niche pastime. Even if the various extrapolations that led to the $3-billion figure are easy to poke holes in, Attributor still directly counted 3.2 million downloaded books.

For some, however, the study may inspire more questions than answers. Who are the people downloading these books? How are they doing it and where is it happening? And, perhaps most critical for the publishing industry, why are people deciding to download books and why now? I decided to find out, and after a few hours of searching – stalled by a number dead links and password protected sites – I found, on an online forum focused on sharing books via BitTorrent, someone willing to talk.

He lives in the Midwest, he’s in his mid-30s and is a computer programmer by trade. By some measures, he’s the publishing industry’s ideal customer, an avid reader who buys dozens of books a year and enthusiastically recommends his favorites to friends. But he’s also uploaded hundreds of books to file sharing sites and he’s downloaded thousands. We discussed his file sharing activity over the course of a weekend, via email, and in his answers lie a critical challenge facing the publishing industry: how to quash the emerging piracy threat without alienating their most enthusiastic customers. As is typical of anonymous online communities, he has a peculiar handle: “The Real Caterpillar.” This is what he told me:

The Millions: How active are you. How many books have you uploaded or downloaded?

The Real Caterpillar: In the past month, I have uploaded approximately 50 books to the torrent site where you contacted me. I am much less active then I once was. I used to scan many books, but in the past two years I have only done a few. Between 2002-2005 I created around 200 ebooks by scanning the physical copy, OCRing and proofing the output, and uploading them to USENET. I generally only upload content that I have scanned, with some exceptions. I have been out of the book scene for a while, concentrating on rare and out of print movies instead of books because it is much easier to rip a movie from VHS or DVD than to scan and proof a book.

I have downloaded a couple thousand ebooks via USENET and private torrent sites.

TM: Do you typically see scanned physical books or ebooks where the DRM has been broken?

TRC: Most of what I have seen is scanned physical books. Stephen King’s Under the Dome was the first DRM-broken book I downloaded knowingly.

TM: Why have you gone this route as opposed to using a library or buying books? Do you consider this “stealing” or is it a gray area?

TRC: I own around 1,600 physical books, maybe a third of which were bought new, the rest used. I buy many hardcovers in a given year and generally purchase more books than I end up reading, so I have not chosen to collect electronic books as opposed to paper books but in addition to them. My electronic library has about a 50% crossover with my physical library, so that I can read the book on my electronic reader, “loan” the book without endangering my physical copy, or eventually rid myself of the paper copy if it is a book I do not have strong feelings about.

I do not buy DRM’d ebooks that are priced at more than a few dollars, but would pay up to $10 for a clean file if it was a new release.

I do not pretend that uploading or downloading unpurchased electronic books is morally correct, but I do think it is more of a grey area than some of your readers may. Perhaps this will change as the Kindle and other e-ink readers make electronic books more convenient, but the Baen Free Library is an interesting experiment that proves that at least in that case, their business was actually enhanced by giving away their product free. That is probably not a business model that will work for everyone, but what is shows is that as a company they have their ear to the ground and are willing to think in new directions and take chances instead of putting their fingers in their ears, closing their eyes, and railing against their customers, as the

music industry is doing. The world is changing and business models have to change with it.

Three additional points:

1) With digital copies, what is “stolen” is not as clear as with physical copies. With physical copies, you can assign a cost to the physical product, and each unit costs x dollars to create. Therefore, if the product is stolen, it is easy to say that an object was stolen that was worth x dollars. With digital copies, it is more difficult to assign cost. The initial file costs x dollars to create, but you can make a million copies of that file for no cost. Therefore, it is hard to assign a specific value to a digital copy of a work except as it relates to lost sales.

2) Just because someone downloads a file, it does not mean they would have bought the product I think this is the key fact that many people in the music industry ignore – a download does not translate to a lost sale. I own hundreds of paper copies of books I have e-copies of, many of which were bought after downloading the e-copy. In other cases I have downloaded books I would never have purchased, simply because they were recommended or sounded interesting.

3) Just because someone downloads a file, it doesn’t mean they will read it. I realize that buying a book doesn’t mean someone is going to read it either, but clicking a link and paying $10-$30 is very different – many more people will download a book and not read it than buy a book and not read it.

In truth, I think it is clear that morally, the act of pirating a product is, in fact, the moral equivalent of stealing… although that nagging question of what the person who has been stolen from is missing still lingers. Realistically and financially, however, I feel the impact of e-piracy is overrated, at least in terms of ebooks.

TM: How easy is it to go online and find a book you’re looking for? How long does it take to download and how much technical expertise is required?

TRC: I have specific tastes, so it is usually not very easy to find specifically what I am looking for. The dearth of material I was interested in is what prompted me to scan in the past, in order to share some of my favorite, less popular authors with as many people as possible. It does not take much time to download once something you want has been found, however, and little technical experience is required.

Since books are generally very small files, they can be downloaded in minutes. You can then convert the file using one of many applications, for instance Mobipocket Creator, to PRC or another format that works with your reader. You can then plug your Kindle into your computer and copy the file over. The entire process typically takes 5-10 minutes.

BitTorrent technology is easy to install and use, and just about anyone can install the basic software needed and begin downloading their first torrent in less than an hour. However, discovering and gaining access to private torrent sites (invite only) can take a lot of time – and of course, that is where the good stuff is. Public sites (no account needed) and semi-private sites (sites that require an account, but usually have open enrollment) have a limited selection, but are easily accessible and anyone with basic computer skills can find and download very popular novels.

Usenet is an older technology, and is considered a safer place to pirate files. For older users like me who were around at the beginning of the internet it seems very simple, but to newer computer users it may seem unnecessarily complex, and more expensive because you need an account separate from your regular internet connection to access it.

TM: Once you’ve downloaded a book, what format is it in and how do you read it? On you computer? Printed out?

TRC: My preferred format for distribution is RTF because it holds metadata such as italics, boldfaces, and special characters that TXT does not, is easily converted to other formats using Word, cannot contain a virus, and is an open format that will be readable forever. Other popular formats are DOC, HTML, PDF, LIT (Microsoft Reader), PRC (Palm), MOBI (Palm), CBR (rar’d image files) – and there is a new format with each new reader that is released. Most formats can be converted to your preferred format with enough ingenuity or the

correct software.

To read, I convert to PRC and load the books onto my Kindle. Before I got that, I read on my Palm or laptop.

TM: How long does it take you to scan a physical book?

TRC: The scanning process takes about 1 hour per 100 scans. Mass market paperbacks can be scanned two pages at a time flat on the scanner bed, while large trades and hardcovers usually need to be scanned one page at a time. I’m sure that some of the more hardcore scanners disassemble the book and run it through an automatic feeder or something, but I prefer the manual approach because I’d like to save the book, and don’t want to invest in the tools. Usually I can scan a book while watching a movie or two.

Once scanned, the output needs to be OCR’d – this is a fairly quick process using a tool like ABBYY FineReader.

The final step is the longest and most grueling. I’ve spent anywhere from 5 to 40 hours proofing the OCR output, depending on the size of the book and the quality of type in the original. This can be done in your OCR tool side-by-side with the scan of the original image or separately in your final output type (RTF, DOC, HTML, etc.). If there are few errors on the first few pages of text my preference is to proof in RTF, otherwise I do the proof within Finereader itself.

TM: What types of books do you look for? What is generally available? Is any fiction or popular non-fiction available?

TRC: I restrict my downloads to books I will likely read – this includes some popular novels, literary novels, and general non-fiction such as humor, biography, science, sociology, etc. Unlike DVD rips, the newest releases are not typically available two weeks before the product is released, if at all. I’m assuming that this is due to the smaller devoted audience books have, as well as the increased difficulty of sharing a book.

TM: Do you have a sense of where these books are coming from and who is putting them online?

TRC: I assume they are primarily produced by individuals like me – bibliophiles who want to share their favorite books with others. They likely own hundreds of books, and when asked what their favorite book is look at you like you are crazy before rattling of 10-15 authors, and then emailing you later with several more. The next time you see them, they have a bag of 5-10 books for you to borrow.

I’m sure that there are others – the compulsive collectors who download and re-share without ever reading one, the habitual pirates who want to be the first to upload a new release, and people with some other weird agenda that only they understand.

TM: Is it your sense that a lot of people are out there looking to get books this way? Or is it just a tiny group?

TRC: I would say that there is a small unaffiliated “group” of people responsible for sourcing the material.

Also, keep in mind that everything I’m saying applies mostly to fiction and general-interest non-fiction.

Textbook, programming and technical manuals are all over the place and its very easy to obtain almost anything you want. I assume there are more sources for that material, and that their high price is a larger factor in people deciding to pirate them. Similarly, there are many communities creating comic, graphic novel and magazine content of whom I am only vaguely aware.

TM: Do you worry at all about getting in trouble for scanning and uploading ebooks?

TRC: A little, but the books I do are typically not bestsellers and are rarely new. I figure I have a bit of a buffer if trouble comes down because the Stephen King or Nora Roberts or “whoever the latest bestseller is” scanners would be the ones to get hit first. I’ve done a lot of out-of-print stuff, and when it is not out of print it’s books by authors like John Barth – someone who no longer sells very well, I imagine.

I’ve debated doing some newer authors and books, but I would need to protect myself better and resolve the moral dilemma of actually causing noticeable financial harm to the author whose work I love enough to spend so much time working on getting a nice e-copy if I were to do so.

TM: What changes in the ebook industry would inspire you to stop participating in ebook file sharing?

TRC: This is a tough question. I guess if every book was available in electronic format with no DRM for reasonable prices ($10 max for new/bestseller/omnibus, scaling downwards for popularity and value) it just wouldn’t be worth the time, effort, and risk to find, download, convert and load the book when the same thing could be accomplished with a single click on your Kindle. Even in this situation, I would probably still grab a book if I stumbled across the file and thought it might interest me – or if I wanted to check it out before buying a paper copy.

I was impressed by the Indie filmmakers of the movie “Ink” – when their movie leaked before the DVD was released, they put a donation button on their site doubleedgefilms.com. I donated even though I haven’t watched the movie yet, just because of their thoughtfulness and sincerity. This didn’t seem to work for King’s “The Plant“, but I think that had a lot to do with the lack of reading technology at the time. I would like to see the experiment tried again by someone like Eggers or Murakami – someone with a very devoted fanbase.

Perhaps if readers were more confident that the majority of the money went to the author, people would feel more guilty about depriving the author of payment. I think most of the filesharing community feels that the record industry is a vestigal organ that will slowly fall off and die – I don’t know to what extent that feeling would extend to publishing houses since they are to some extent a different animal. In the end, I think that regular people will never feel very guilty “stealing” from a faceless corporation, or to a lesser extent, a multi-millionaire like King.

One thing that will definitely not change anyone’s mind or inspire them to stop are polemics from people like Mark Helprin and Harlan Ellison – attitudes like that ensure that all of their works are available online all of the time.

[Image credit: Patrick Feller]