Napster is an example of a manually-filled database that has found a way to use volunteer labor such that normal use increases its value.

There has been a lot of discussion lately about the success of Napster in becoming a popular application. I'd like to put in my two cents about what we can learn from it and other successful applications. The answer is not Peer-to-Peer communications.

I think one of the main reasons Napster is successful is that you can find what you want (a particular song) and get it easily. This stems mainly from the fact that so many songs are available through Napster. If Napster only let me get a few popular songs, once I've downloaded those I'd lose interest fast.

It isn't that Napster uses Peer-to-Peer (P2P). That's plumbing, and most people don't care about plumbing. While the "look into other people's computers and copy directly" has some psychological benefit to some people who understand what's going on (see Tom Matrullo on Doc Searl's weblog , also quoted in DaveNet ), I think Napster would operate much better if, when you logged in by running Napster, it uploaded all new songs that weren't in Napster's database to Napster's servers, not just the names and who currently logged in has them. If they were copied to a master server, the same songs would be available for download provided by the same people, but at all times (not just when the "owner" happened to be connected to the Internet), and through (hopefully) more reliable and higher-speed connections to the Internet (Akamai, etc.?). Even the list of who had the songs could be maintained. Napster doesn't work this way partially because P2P may be more legal (they argue) and harder to litigate against. Other applications may not have this legal problem and would therefore be able to benefit from more centralized servers. While I'm a strong proponent for P2P for some things I don't think that is the main issue here.

The issue is can you get what you want from the application -- "Is the data I want in the database?". So, I'd like to examine how a shared database gets filled with lots of what people want.

How shared databases are filled with data There are three common ways to fill a shared database: "Organized Manual", "Organized Mechanical", and "Volunteer Manual". The original Yahoo! is the classic case of a database filled by organizing an army of people to put in data manually. Another example is the old legal databases where armies of typists were paid to retype printed material into computers. The original AltaVista is an example of an organized mechanically-filled database -- a program running on powerful computers followed links and domain names and spidered the web, saving the information as it went. Newsgroups and SlashDot are examples of volunteer databases, where interested individuals provide the data because they feel passionate enough about doing so.

Many databases on the web today are mechanically created by getting access to somebody else's data, sometimes for a fee. Examples are the street map and airline flight status databases. Some of those databases are by-products of automated processes.

Manually created databases The more interesting databases (to us here) are the ones that involve manual creation. Some examples: Amazon.com's reviews (both house reviews and reader reviews) are a major asset. Yahoo!'s organized manual listings have helped get them to the lead for searching.

A more interesting one to me is the CDDB database. The CDDB database has information that allows your computer to identify a particular music CD in the CD drive and list its album title and track titles. Their service is used by RealJukebox, MusicMatch, WinAmp, and others. The title information is not stored on most CDs. The only information in the CD data is the number of tracks (songs) and the length of each. This is the information your CD player displays. What CDDB does is let the software on your PC take that track information, send a CD signature to CDDB through Internet protocols (if you're connected) and get back the titles. It works because songs are of relatively random length. The chances are good almost all albums are unique. (Figure there are about 10 songs on an album, and they each run from a minute and a half or so to three and a half minutes long, so the times vary by 100 seconds. There are 100x100x...x100 = 100**10 = 10**11 = 1 hundred billion = an awful lot of possible combinations.) An album is identified by a signature that is a special arithmetic combination of the times of all the tracks.

You'd figure that CDDB just bought a standard database with all the times and titles. Well, there wasn't one. What they did was accept Internet-relayed postings with the track timing information and the titles typed in by a volunteer. Music-CD-playing software for personal computers was written that let people type in that information if CDDB didn't have it. Enough people using that software cared enough when they saw one of their albums not coming up with titles when they played them on their PCs to type in the information. Those people got the information for themselves, so they could more easily make their own playlists, and in the process also updated the shared database. Only one person with each (even obscure) album needed to do this to build the database. If you loved your CD collection, you'd want all the albums represented, or at least some people did. Not everybody needed to be the type who likes to be organized and label everything, just enough people to fill the database. Also, they only needed to rely on "volunteer" (user) labor until the database got big enough that it was valuable enough for other companies to pay for access.

CDDB's database is on dedicated servers, controlled by them. Their web site says : "CDDB is now a totally secure and reliable service which is provided to users worldwide via a network of high availability, mirrored servers which each have multiple, high bandwidth connections to the Internet...boasting a database of nearly 620,000 album titles and over 7.5 million tracks."

Napster Napster is a manually created database created by volunteers. Somebody needs to actually buy (or borrow) a copy of a CD, convert it to MP3, and store it in their shared music directory. Or, somebody needs to create an MP3 of their own performance that they want to share. In both cases, creating the copy in the shared music directory can be a natural by-product of their normal working with the songs, for example as part of downloading them to a portable music player or burning a personal-mix CD. Whenever they are connected to the Internet and to the Napster server those songs are then available to the world. Of course, that person may not be connected to the Napster server all the time, so the song is not fully available to all who want it (a problem with P2P). However, whenever someone downloads a song using Napster and leaves the copy in their shared music directory, that person is increasing the number of Napster users who have that song and raises the chances you will find someone with it logged in to Napster when you want your copy, so, again, the value of the database increases through normal use.

What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present, especially since sharing is the default. It isn't even like the old song about "leaving a cup with water by the pump to let the next person have something to prime it with" (I'll have to use Napster to find that song...) where it just takes a little bit of effort, so why not be nice to the next person like the last one was to you.

As Kevin Werbach wrote :

What made Napster a threat to the record labels was its remarkable growth. That growth resulted from two things: Napster's user experience and its focus on music...What makes Napster different is that it's drop-dead simple to use. Its interface isn't pretty, but it achieves that magic resonance with user expectations that marks the most revolutionary software developments.

I would add that in using that simple, desirable UI, you also are adding to the value of the database without any extra work.

I believe that you can help predict the success of a particular UI used to build a shared database based on how much normal, selfish use adds to the database.

The Commons There is the concept of " The Tragedy of the Commons " popularized by Garrett Hardin in 1968:

Therein is the tragedy. Each man is locked into a system that compels him to increase his herd without limit -- in a world that is limited. Ruin is the destination toward which all men rush, each pursuing his own best interest in a society that believes in the freedom of the commons. Freedom in a commons brings ruin to all.

In our case, we find the Cornucopia of the Commons: Use brings overflowing abundance.

-Dan Bricklin, August 7, 2000

This essay was delivered as a speech at O'Reilly's P2P conference 14 Feb 2001. There are slides that include material about doing analysis of costs for such systems. It also appears in O'Reilly's Peer-to-Peer book published at that time.

(See also: Thoughts on Peer-to-Peer , Friend-to-Friend Networks and A Taxonomy of Computer Systems and Different Topologies: Standalone to P2P . Other Napster related essays include The Software Police vs. the CD Lawyers and How the Napster injunction and other legal decisions affect directories on the Internet .)

Additional Thoughts Evan Williams wrote some comments that are relevant here. He points out that a good volunteer-created database should be designed with incentive for the entry of accurate information. One way is to use data that you rely on yourself, such as with CDDB. You can read Evan's comments in his February 16 entry .

Talking to experienced Napster users, I've discovered another benefit from increasing the number of users: More users increases the likelihood that a song will be indexed in a way that helps you find it.

While songs have an "official" title, not everybody knows the song by that name. A normal simple database would have just that text. With Napster, since people name the files in ways they feel will help them identify the songs themselves, many use more discoverable names than the "official" title, such as the chorus. Some people provide a mixture with one name in parenthesis. For example, Harvey Danger's song "Flagpole Sitta" is known by many people as "Paranoia", and a large percentage of the copies available through Napster are named that way, some with both. You'll find music files with both "Ode to Joy" and "9th Symphony" in the names, etc. Note that you don't have to be the original provider of the song to add value this way -- you could rename it after you got a copy to help yourself find it on your system later.

So, here again, more users increases the value, this time by adding human created variations.

This is another part of the bar recording industry-provided systems will have to get over if they want to serve music lovers as well as Napster. It isn't just price.

-Dan Bricklin, March 2, 2001

It was pointed out to me that Prof. Hardin later said he should have named his essay "The Tragedy of the Unmanaged Commons". In 1994 he published a paper with that title.

-Dan Bricklin, April 23, 2001

I wrote something somewhat related in January 2005 in a blog post titled " Systems without guilt where every contribution is appreciated " that bloggers sometimes refer to:

In reaction to, and support of, AKMA's post about tagging, Dave Winer writes that he stopped tagging the categories of blog posts. As soon as he missed one he felt guilty and then as the guilt grew he tagged less. He started just assigning things to a couple of categories and then not tagging at all.

I think Dave has pointed out a key problem with tagging. It seems like a nice idea but it requires us to always do it. The system wants 100% participation. If you don't do it even once, or don't do it well enough (by not choosing the "right" categories), then you are at fault for messing it up for others -- the searches won't be complete or will return wrong results. Guilt. But because it's manual and requires judgment you can't help but mess up sometimes so guilt is guaranteed. Doing it makes you feel bad because you can't ever really do it right. So, you might as well not play at all and just not tag.

This is the opposite of what I was getting at in my old Cornucopia of the Commons essay about volunteer labor. In that case, in a good system, just doing what you normally would do to help yourself helps everybody. Even helping a bit once in a while (like typing in the track names of a CD nobody else had ever entered) benefited you and the system. Instead of making you feel bad for "only" doing 99%, a well designed system makes you feel good for doing 1%. People complain about systems that have lots of "freeloaders". Systems that do well with lots of "freeloading" and make the best of periodic participation are good. Open Source software fits this criteria well and its success speaks for itself.

So, here we have another design criteria for a type of successful system: Guiltlessness. No only should people just need to do what's best for them when they help others, they need to not need to always do it.

See also " More on who does tagging ".

-Dan Bricklin, October 12, 2006

