One of the suggested objectives in Gabor Szabo's Measurable objectives for the Perl ecosystem is "Increase the number of Perl book sales". I like most of the objectives in Gabor's post, but I must caution against taking the numbers presented too seriously. At best, they're incomplete. At worse, they're completely misleading.

Gabor refers to State of the Computer Book Market - Mid-Year 2009, written by Mike Hendrickson. Mike's company publishes similar analyses a few times a year, based on sales data from Nielsen Bookscan. (For more on the culture of Bookscan rankings in the publishing world, see Why writers never reveal how many books their buddies have sold.)

This data sounds wonderful and the pretty graphs and charts give you the impression that you're getting useful information. Yet this is only a picture of the market. As Mike writes later in the piece:

Many publishers report that more than 50% of their revenue is achieved as direct sales, and those numbers do not get reported into Bookscan. Sales at traditional college bookstores are typically not reported into Bookscan as well. Again this is US Retail Sales data recorded at the point of sale to a consumer.

These numbers reflect less than half of revenue. Throw out half of sales by dollar and hope that the results are stochastic.

Yet there's a deeper flaw behind these numbers.

How Book Sales Work

If you plot the sales curve of multiple books, you'll notice that they tend to follow the ubiquitous power law. A book sells as many copies in its first three months as it will the rest of the first year. A book sells half as many copies in the second year as it did in the first. This model is so accurate that the publishing industry calls titles "frontlist" titles if they're in their first year of publishing and "backlist" titles if they're not.

While a few titles have strong backlist sales, they're rare. They're the Bibles and Harry Potters and How to Win Friends and Influence Peoples. The publishing industry's san greal is to find a new strong backlist bestseller.

They tend to exhibit strong frontlist behavior as well.

The retailer's point of view is different. Limited shelf space means that new books still in their three-six-twelve month short snout sales levels often get priority over older books in the long tail sales levels. If you're going to see 3000 copies of a book in the first three months and 1000 copies in the next three years, stock up early.

This is especially true in technical book publishing, where I have trouble giving away Python 2.3 and Oracle 7 and Red Hat Linux 6 books. Publishing dates are expiration dates: best by a year after the copyright date.

Why does this matter? It's a simple matter of economics: people won't buy books you don't publish.

The Freshness Factor

2005 was a good year for Perl book sales. Why? Four strong Perl books came out in 2005. The Perl book sales numbers for that year reflected the short snout of Perl book sales.

Four years later, is PBP selling as many copies? Is the Perl Testing book? Is HOP? Is APP 2e?

Those are rhetorical questions. You already know the answer. You can even answer that question for the Camel 3e. A book published in 2000 may still be useful nine years later, but Camel 3e predates almost every part of the Perl Renaissance. Besides that, the 250k or 300k units already sold have reached a fair amount of the Perl 5 programming market.

Compare that with the Ruby book market in 2006, where you couldn't leave an editorial board meeting without an assignment to publish a new Ruby or Rails book. Initial sales numbers looked great; the growth in that market segment was huge!

Did any Ruby book sell 250k copies, though? That number's missing from the year-by-year analysis.

Look at this year's numbers. Objective-C is huge! It's 1999 all over again! Except that, yet again, the comparison is to an emerging market segment without analysis of historical trends.

The Missing Data

The biggest piece of data obviously missing from these State of the Computer Book Market entries is historical context. Six months or a year of appositional data comparing different market segment maturities is misleading, at beast. Should you go learn Objective-C just because Bookscan reported more Objective-C titles sold than SQL?

No -- but to be fair, Mike doesn't suggest this directly.

Other missing data is more subtle, and perhaps more meaningful. Where's the breakdown of frontlist/backlist for these sales figures? More than nine out of ten books follow the power law I described earlier. If the Objective-C books have all come out in the past year, they're in their short snout period. Of course they're selling more units now than books in the long tail period.

How many total units does the market represent? If the number of books sold in 2009 is half the number sold in 2008, it's difficult to compare the performance of books against each other year-over-year. There are too many other factors to consider. (You can still get interesting information, but you can't compare technologies against each other in meaningful appositive ways.)

How many books are in each category? Title efficiency (average number of unit sales per title and standard deviation) can tell other interesting stories. Is one language category hit driven (iPhone Programming, Ruby on Rails)? Are there niche subjects intended as modest sales targets and not bestsellers? Is every book a moderate success, with no breakout quintessential must-have tome? Is there a gold rush of publishing with 40 new titles produced in a year and each of them selling a dismal 1000 copies apiece?

How many new books are in a market segment this year compared to last year? This is the biggest question that matters to Perl books, especially with regard to Gabor's suggestion. Again, this should be obvious: no one can buy Camel 4e right now.

A Completely Hypothetical Fictional Example I Made Up Completely From Whole Cloth

If that didn't convince you, consider a short fable about oyster farming.

Suppose you own a publishing company. Suppose you discover a new topic area: oyster farming. No one's published on this topic before, but hundreds of thousands of people are doing it. There's a lot of institutional knowledge, but there's a ripe opportunity for documenting best practices and nuanced techniques -- especially given that you have found the person who invented modern oyster farming and convinced him to write a book about it.

You publish the book. It takes off. Its short snout is wide. (My metaphor is awkward.) You've discovered a new market segment; you've invented a new market segment. Life is grand.

You branch out. You publish More Oyster Farming and Learn to Farm Oysters and Pteriidae, Reefs, Bivalves, and Mollusks. You even write a cookbook for Oysters.

Then a catastrophic triploid spawning accident removes the long-beloved MSX resistance in most commercial oyster farms, ruining the market for a year -- maybe longer -- and in a panic you cancel all of your upcoming frontlist titles.

A few other publishers publish one- or two-off titles in the market segment. They sell a few copies. You had a corner on the market though. You were the publishing world's China of oyster farming. Over the next four years, you look at your sales numbers and congratulate yourself for getting out of the oyster farming publishing market segment when you did, because no one's buying oyster farming books anymore.

After all, publishing one frontlist title per year is obviously a sign you take the oyster farming market seriously and want to see it continue.