Conventional wisdom suggests that the primary reason why so many people do not accept Darwin's theory of evolution is that they find it threatening to their religious beliefs. There is no question that religion is a big part of the reason behind the large number of people who reject evolution. But I am convinced that just as often, the cause and effect is reversed: people hold onto their fundamentalist religious beliefs because evolution by natural selection -- the strongest argument against an Old Testament-type creator -- is so counter-intuitive to so many.

I arrive at this conclusion in a somewhat roundabout way. I have long been fascinated with systems that tap into the "wisdom of crowds" -- systems that, in fact, have much in common with Darwinian evolution. Such systems doubtfully conflict with anyone's religion, and yet, I see the same sort of resistance to them as I see to evolution. The arguments against them are remarkably similar.

This hypothesis, if borne out, suggests that advocates of reason -- moderates, atheists, and the science minded -- might consider a different tack if they wish to convince more people to reconsider their fundamentalist, anti-scientific beliefs. It may be easier to first go after this non-intuitiveness, starting with these places where the conceptual difficulty is not exacerbated by the conflict with their comforting and culturally embedded religious belief.

Below I cover three separate systems, each of which has strong similarity to Darwinian evolution, each of which seems to elicit a "but it just can't work" response, and none of which conflict with any religion I know of. They are:

1. Wikipedia:

Most people who actually use the Wikipedia online encyclopedia on a regular basis recognize that it is an amazing resource, and is getting significantly better as time goes on. However, I have spent a lot of time debating with intelligent people who simply reject that Wikipedia can be accurate or reliable, given that it can be edited by anyone.

Of course, it is true that Wikipedia has been vandalized often, that many of the entries contain poorly written sections, and that some of the facts presented are dubious. I don't suggest anyone use it to verify that the mushroom they found in their backyard is safe to eat. Nevertheless, the science journal Nature published a study in 2005 concluding that Wikipedia fares quite well when compared to Encyclopedia Britannica in terms of accuracy. A study by IBM [pdf] in 2004 found that vandalism is usually repaired extremely quickly - so quickly that most users will never see its effects. Meanwhile, Wikipedia has 10 times the amount of content as Britannica, is growing much more rapidly, and, most importantly, is being refined and improved every minute of every day. (not to mention, it is available online for free!)

Wikipedia founder Jimmy Wales described the online encyclopedia as being "like a sausage: you might like the taste of it, but you don't necessarily want to see how it's made." In a Nature magazine blog accompanying the study mentioned above, Timo Hannay said that "frankly, I still can't get over the fact that it works at all." Indeed, the problem most people have with Wikipedia's quality and accuracy seems to have more to do with their knowledge of how it is made, rather than any observed problem with the end results.

There is no question that there is something unsettling about the idea of a resource that can be edited by anonymous internet users. We would expect that many, if not most, of the edits will be of poor quality. The natural assumption might be that the quality of the end result will be the average quality of all the edits -- but nothing could be further from the truth.

Comparing it to evolution, an edit of Wikipedia might be considered equivalent to a genetic mutation. A mutation, of course, is non-directed...that is, "random." It could be bad or good, but most of the time it is bad. If we were simply the average of all mutations that predated us, we would be nothing more than a pile of goo. And yet we are not.

The reason that Wikipedia is as good as it is (and the reason that living organisms are as sophisticated as they are), is not due to the average quality of the edits (or mutations). Instead, it is due to a much harder to observe process: selection. Some edits survive, while others quickly die. While one can look at the history of a Wikipedia article and see each and every edit, it is much harder to tell how many potential editors looked at an article, subconsciously thought "I doubt I could improve this much," and chose not to try. Each of these can be considered a "selection event", and the number of such events vastly outnumbers the actual edits. Selection is the heart of what makes Wikipedia -- as well as Darwinian evolution -- work.

As much as evolution impresses us with its ability to turn countless random mutations into sophistication, it isn't without its downsides. Every time we see a person or animal that suffers from a severe birth defect we see the cruelty of the process, but we also recognize that such a mutation will probably not survive more than a generation or two due to the power of selection. Likewise, when we see glitches in Wikipedia (whether due to vandalism, someone pushing an agenda, or just bad writing), we are seeing the "random" part of the process in action. Again, we generally see that selection kicks in rapidly, and the glitches disappear.

This difficulty in seeing and understanding the power of selection is why, in spite of the evidence to the contrary, people will claim that Wikipedia must be a poor quality source of information. Luckily, though, the end results are there for everyone to see, and most people judging it on end results alone seem to agree that it is an excellent source of information. And unlike living things, no one can easily doubt that Wikipedia is indeed created in the way that we are told it is.

2. Prediction Markets:

One of the purest examples of "wisdom of crowds" is prediction markets, where speculators can bet on the chances of future news events, such as the outcomes of sports events or political elections. For instance, at intrade.com, I can see that (on the day I clipped the data at right: October 11 2007) the market thinks that Hillary Clinton has about a 46% chance of being elected president, while Rudy Giuliani has 15% and Mitt Romney has less than 9%. This isn't the percentage of people who are expected to vote for each candidate (as polls try to predict), but the actual percentage chance of winning -- a very different thing. In fact, today it gives Al Gore around 10% chance, and he isn't even running. The market is not just guessing how people will vote and how those votes will break down by state, but it is factoring in the probability of Gore winning a Nobel Peace Prize tomorrow, and if he will in turn decide to throw his hat into the ring. In effect, it tries to take into account everything that may factor in -- things that polls alone can't reach.

Politics - 2008 Presidential Election Winner - Oct 11, 2007 Contract Bid Ask Last Vol Chge 2008.PRES.CLINTON(H)

Hillary Clinton M 46.1 47.0 46.1 108587 -0.6 2008.PRES.GIULIANI

Rudy Giuliani M 15.1 15.2 15.1 22057 -0.1 2008.PRES.ROMNEY

Mitt Romney M 8.2 8.6 8.7 16243 -0.0 2008.PRES.THOMPSON(F)

Fred Thompson M 7.0 7.5 7.0 8927 -0.2 2008.PRES.GORE

Al Gore M 8.6 10.6 8.6 63115 -0.8 2008.PRES.OBAMA

Barack Obama M 6.8 6.9 6.8 19932 +0.2 2008.PRES.EDWARDS

John Edwards M 2.5 2.6 2.5 8898 0 2008.PRES.McCAIN

John McCain M 2.2 2.3 2.2 22331 +0.1 2008.PRES.PAUL

Ron Paul M 3.0 3.1 3.0 14536 +0.8 2008.PRES.BLOOMBERG

Michael Bloomberg M 0.3 0.6 0.3 5444 +0.0 2008.PRES.HUCKABEE

Mike Huckabee M 0.5 0.6 0.5 4592 0 2008.PRES.RICHARDSON

Bill Richardson M 0.1 0.2 0.1 4374 +0.0 2008.PRES.GINGRICH

Newt Gingrich M 0.1 0.2 0.1 7376 +0.0 2008.PRES.BIDEN

Joe Biden M 0.1 0.2 0.1 4153 +0.0 2008.PRES.DODD

Chris Dodd M - 0.1 0.1 142 0 2008.PRES.WARNER

Mark Warner M - 0.1 0.1 1079 0 2008.PRES.ALLEN

George Allen M - 0.1 0.1 485 0 2008.PRES.FIELD

Field (any other candidate) M 0.3 0.4 0.3 4411 +0.0

The way this works is actually rather simple. For instance, if I think Clinton has a greater than 46% chance of winning, I can buy a "contract" on her for $46. It will pay $100 if she wins, $0 if she loses. Or, I can turn around and sell the contract in a week or two, hopefully for a few dollars more than I paid (if her market price has gone up). Alternatively, I could bet against her for $54. Like any market, the price of each item adjusts according to supply and demand.

It should not be surprising to hear that a great many people, when told of how prediction markets work, will claim that they can never produce meaningful results. After all, the market price, and therefore the prediction, comes solely from random people on the internet who decide to take a wild guess at who is likely to win. Sure, they are putting their hard earned cash on the line, but that doesn't mean they are experts. Certainly the opinion of an expert -- who has studied all the polls, and understands statistics and the math of the electoral college -- would produce a much more accurate prediction than just the average of the opinions of lots of John Q. Public's.

And yet, that isn't the case. Prediction markets turn out to be remarkably accurate, typically more accurate than any individual expert can predict, as non-intuitive as it may seem. Like Wikipedia, prediction markets also tap into the power of selection, but the most dramatic similarity they share with evolution is their equilibrium seeking behavior.

Imagine that lots of random people come in and make bad guesses at who will win the election. The price of the contracts will then vary significantly from what the best expert would predict, resulting in an unstable (i.e. non-equilibrium) situation. Now all it takes to make some easy money is to consult with such an expert and buy the contracts whose prices are the furthest from the experts' estimates. If it is indeed this easy to make money, the market will attract lots of people, including institutional investors who have the ability to invest enough to quickly move the price back to where the experts predict. Meanwhile, those experts who consistently predict badly will tend to eventually pick another line of work which they are better at, while those who are best at picking will make lots of money doing so, and will therefore tend to be there with cash in hand whenever the prices stray far from their predictions. Each expert tends to gravitate toward the specific things that they might have special expertise (or inside information!) on and therefore has the best chance of out-predicting the other experts. Over time, it becomes harder and harder to consistently outguess the market, no matter how good you are.

As much as this may make logical sense, this sort of equilibrium-seeking process is exceptionally difficult to directly observe. All we can look at is the individual transactions, but we can't see all the people who might have been attracted to a particular contract had they thought that it would be relatively easy to make money. And we can't directly see the statistical pressures that are constantly keeping the prices at a stable equilibrium.

Evolution, of course, has similar equilibrium-seeking behavior. Imagine an animal that, were its earlobes shaped slightly differently, would be ever so slightly better able to hear the sounds made by potential prey. No matter how long you watch such animals, you would be hard pressed to find an actual situation where that subtle change would mean the difference between life and death. But as long as there is a statistical difference, a suboptimal earlobe is an unstable situation, waiting to be corrected. And, typically it will be, in surprisingly short order. The cumulative effect, of course, is what we see around us in nature: an absolutely breathtaking degree of adaptation in planet Earth's life forms.

Such equilibrium-seeking behavior, whether in markets or in evolution, seems to defy intuition. The problem is that when you look closely, at the level that human observation works the best, all that is visible is a whole lot of slop. It is only when you step back far enough to see things from a statistical point of view does the true precision of the process come into view. Clearly, this is very, very hard for many -- if not most -- people to do.

3. Recommendation systems:

Like many online vendors, movie rental service Netflix has a recommendation system: it allows users to rate movies they have watched, and, based on these ratings and the ratings of others, offers recommendations of movies the user has yet to view. This is a form of machine-learning known as collaborative filtering.

Last year, Netflix launched a contest, where they offered a million dollars to someone who could write software that does the job better than Netflix's own "world-class movie recommendation system." Specifically, the winner has to beat Netflix by 10 percent. I tried my hand at the contest, and quickly beat Netflix by around 3% (putting myself at 8th place a few weeks into the contest), but eventually gave up as I was competing against a lot of seriously smart people who had done their PhD's on this very type of problem, while I was basically winging it. A year later, contestants are getting rather close to the million dollar prize, with about 8.5 percent improvement.

The contest made available the ratings of half a million real Netflix users, for 18,000 movies. The total number of ratings in the set is about 100 million...quite a large amount of data. Contestants are asked to predict an additional one million ratings, unknown to anyone but Netflix, given a user id and movie id for each one. Contestants are scored based on how far their ratings differ from the actual ratings.

While the contest attracted a lot of smart people with deep knowledge of machine learning, it attracted all types. In the forums on their web site, the discussion seemed to all be about one thing: how do we get additional data about the movies? (examples here, here, and here) Contestants wanted to be able to download and use information such as the director, the actors, the year made, whether it won any awards, how it did in the box office, etc. But mostly, they wanted to know the genre: whether it was science fiction, horror, romantic comedy, drama, documentary, etc. After all, if we are trying to predict which movies a particular user will like and which they won't, the genre is absolutely critical.

Since the dataset did contain the movie title, it was possible to get this data from elsewhere (say, IMDB.com), but not without considerable expenditure. What interested me, though, was how steadfast these people were in declaring that the this information was so critical to being able to make sense of all the data and do reasonable predictions. I debated with a few of them, and found it impossible to convince them that such data was completely unnecessary, and that the purely numerical data supplied in the original dataset was quite enough to very accurately categorize movies, detect the tastes of users, and predict their ratings on the additional set of movies.

While the algorithm I came up with was unable to win the contest (well, given the time I had available to put into it), it certainly worked well. And unlike any other algorithm I have seen for collaborative filtering, mine is one that is easy to explain to people who don't have advanced math degrees.

The idea is that I needed to put each movie, and each user, into a "neighborhood," which roughly equates to "genre." There is a science fiction neighborhood, a comedy neighborhood, a horror neighborhood, and so on. But the neighborhoods have blurry boundaries, just as real neighborhoods typically do. "Alien" would be somewhere between the science fiction and horror neighborhoods, while "The Hitchhiker's Guide to the Galaxy" would be somewhere between the science fiction and comedy neighborhoods. Each user would live in a neighborhood, closest to the type of movies they prefer, and furthest from those they dislike.

To do this, my program starts by giving each user, and each movie, a random position in space. That is, each gets a value for X, Y and Z representing its position. For each of the 100 million ratings, the program simply adjusts the distances between each item: if a user likes a movie, it moves the user and movie closer to each other by a tiny amount. If the user dislikes a movie, it moves the user and movie further away from one another. The program iterates over and over, until the positions stabilize: that is, an equilibrium is reached. This takes quite a few hours, but once it has done it, small changes (such as modifying the data, or modifying a parameter within the program) take very few iterations to re-stabilize the model.

If a movie is near a user -- in the same neighborhood, so to speak -- it can be predicted that that user will probably like that movie, even if the user did not specifically rate it. Movies that are universally liked tended to move toward the center of the model ("Shawshank Redemption" being closest to center), disliked movies moved toward the outside. In practice, I found that giving 12 or so dimensions, rather than just 3, worked a lot better, allowing a much richer categorization, and allowing each neighborhood to be adjacent to a great many other neighborhoods. There are several other layers of complexity in order to get the best results, but the gist of the approach is just as simple as described.

What was striking to me was that this system, iterating over a massive amount of sloppy, low precision data, could organize the model with such stunning precision. I could type in the names of two movies, and ask "how similar" they are, and the results were almost always exactly what I would expect. I could type the name of a movie, and get a list, in order, of the top 20 movies that are seen as most similar. And it did quite a good job at the assigned task, predicting how users would rate movies. Those who claimed the process couldn't work, after seeing the results, were shocked.

The point, of course, is that this system is very evolution-like, in that lots of messy data, with very little apparent "intelligence," processed by a simple iterative algorithm, can find sophisticated equilibria with a great deal of precision. Looking directly at the raw data, such as at an individual user's set of ratings, would indicate a lot more slop than is apparent in the final model. The system doesn't "know" that a movie is a science fiction movie, any more than natural selection "knows" why a particular mutation in the DNA increases the chance of an animal surviving to adulthood. Nonetheless, it works, against all intuition.