Computing Thoughts

We Need A New Information-Sharing Model for the Internet

by Bruce Eckel

June 5, 2005



Summary

We need a new model for sharing information, one that leverages the Internet in a way that scales.


I think that there are two ways that we locate information in a resource: search and structure. In print books, search has always been a difficult approach, and this has been dramatically improved with the Web and electronic documents. But structure is essential, especially when you don't really know what you're trying to find, or you are learning material for the first time. You could argue that you can use search to discover information in a newsgroup, but as the data increases, it becomes much harder to track down the information because it tends to become very scattered. If it were easy, it would cut way down on repeated questions and make the newsgroup a place that people would stay for much longer, rather than the usual response of fleeing when the noise level gets too high.

Structure is essential because it clumps information together, and presents it in a linear form so that a newcomer can absorb the information in the right order. It is also the lowest-entropy form of information, and thus requires the most effort to produce. Most information on the Internet lacks structure, and thus is difficult to use. For a number of years I have been pondering whether it would be possible to create a self-structuring way of sharing information -- one that would naturally tend towards decreasing entropy as people added more to it.

I am hunting for the middle ground. On one extreme is a newsgroup, which has all kinds of very good information but in an essentially linear form. It has very little structure (the best we've seen is "threads" but these are only slightly helpful), and so as we've seen again and again, it doesn't scale. Many people have talked about the phenomenon of a newsgroup being good at the beginning, but when you get too many people it loses its integrity and manageability. Many newsgroups are full of good stuff but I can't handle the volume and the low signal-to-noise ratio so I don't subscribe. If a newsgroup had more structure, however, it could be extremely useful. Information would aggregate in a structured fashion, so when people had questions or new ideas they would go where that information already was. Newbie questions could be answered by navigating a tree, and when a topic came up it would be placed alongside the existing material in the tree, rather than repeating something over and over in a linear fashion, as is done with newsgroups. Some newsgroups work better than others because they have a small group of mature and dedicated people that help keep things on track (comp.lang.python is an example of this). But in general, newsgroups don't scale because they are supposed to be a conversation among a relatively small group of people.

Weblogs scale because they simulate the "eyes forward" style of traditional lecturing. You go to a lecture to hear a particular speaker. With a weblog, lots of people can listen without the system breaking down, and there can be Q&A. The structure of the discussion is around each article, so it's less likely that people get off topic, and you can always just read the article and not the comments (in a newsgroup you never know when you'll find the real essence of the discussion, so you have to read endlessly).

The opposite extreme from the newsgroup is a zine. This requires a lot of effort by experts. Most experts already have too much to do, and can't simply add this task to their list. So in this case, the effort is what doesn't scale.

In pre-internet days, print magazines were the only outlet for less-than-book-sized ideas. Writing for magazines required a lot of hassle and didn't usually pay well, but it paid something and you did it because that was the way to publicize ideas. But with the net, you have many ways to publicize ideas. The weblog requires a lot less effort. "Less effort" is a form of payment. If I want my ideas published, I could do it through a zine but that's going to require a lot more time and effort than if I just publish it through a weblog. Maybe I don't know whether the idea is really worth publishing, and perhaps I just want to try it out. And if I do have an idea worth publishing and am willing to do the work of putting it into article form, why not start thinking of a book? (or, note what Joel Spolsky is doing by collecting the best weblog entries of the year from all over the web, and publishing it as a book -- that's a good idea because he acts as the filter so you don't have to do all the work of finding those pieces yourself).

The internet is busting through the model of "he who owns the presses decides the news." With the Internet, we have alternative incentives. However, the magazine model relies on three different forms of incentive:

You get your ideas out there. The internet keeps coming up with more alternative ways to do this, more easily. So this factor doesn't work with a zine. You get paid. If you want high-caliber people to put in a lot of work, money is an important incentive. You get an article reviewed and vetted by said high-caliber people. This is the value/notoriety of association. But without #2, those people aren't there.

That's where the model falls apart -- most high-caliber people are already too busy with their own stuff to take on another task, especially one that involves donating their time, and especially when they have #1 as an outlet. That's why I question the zine approach.

As an alternative, I keep imagining some kind of emergent way to produce structured information, where a lot of people can put in little bits of time, and everyone benefits from the result. I don't know the mechanism for such a thing, but the Internet is what would make it possible. We need to explore the possibilities that the Internet provides, rather than trying to resurrect something from the pre-Internet world.

The wiki clearly has some elements of this. It allows the participants to decide what the organization will be. A wiki has a different feel than a newsgroup because the information is clearly persistent and there is supposed to be structure. However, the wikis I've seen end up suffering because everyone doesn't have the same vision for what that structure should be. Without something that keeps bringing the structure into focus, entropy results and it becomes harder and harder to find what you want. It would be interesting to see what the scaling factor for a wiki is vs. a newsgroup -- how many active participants can a wiki support before it starts losing focus, and what is the same number for a newsgroup.

One approach that shows promise is the wikipedia. The have a number of advantages on that project:

A clear, well-defined goal (create an encyclopedia) with a predefined structure Many volunteers, and a number of people (if I understand correctly) dedicated full-time to the project A vast number of consumers -- arguably everyone on the planet -- to justify the effort

Ironically, I think this shows that the wiki is not the ideal medium for a many-person discussion. It requires the tremendous effort seen in the Wikipedia in order to keep the information organized.

Many of the basic ideas of the wiki are valid, however. It allows a person to contribute a very small amount (such as a spelling or grammar correction) or a large amount (such as maintaining the structure of an entire wiki). It has relatively easy entry points. It's possible to create a structured document with automatic table of contents. To maintain the clarity of a wiki, however, requires one or more editors dedicated to the job, who have the vision of what the document should look like.

For a quality document, I think that editing is essential, but my hope is that we can develop a system that would allow one or more of the following:

The computer participates in the editing process (at least the structuring) The amount of effort by the editor is reduced The editing process is distributed among some of the volunteers The system allows the structure to be an emergent property of the document

Although I have made attempts at some of these features (for example, I was largely responsible for the "Backtalk" feature that you see in the online version of the Zope book), I don't really know how all this could be accomplished. But I'm reasonably certain that the model for the "self-organizing book" would be one of the Next Big Things on the Internet.

Talk Back!

Have an opinion? Readers have already posted 10 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bruce Eckel adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.

This weblog entry is Copyright © 2005 Bruce Eckel. All rights reserved.