One of the most interesting paradoxes studied by Philosophers, is also one that challenges our entire approach to science and knowledge gathering. Despite its monumental importance, there currently exist no satisfying resolution to this paradox. Which is unfortunate, because by studying this paradox, and learning how to resolve it, we all stand to benefit immensely, both as a civilization, and even in our own daily lives. This is exactly what we shall now aim to do.

The paradox, first proposed ~80 years ago, is known as the Raven-Paradox. A quick walk-through for the uninitiated: Suppose we want to examine the hypothesis that All Ravens are Black. On sighting numerous Black Ravens, but no non-Black ravens, one can then declare that he has seen many Ravens, all of which are Black, and hence, has evidence in support for the above hypothesis.

However, the hypothesis can also be restated as All non-Black Objects are not Ravens, which if true, necessarily implies that all Ravens are Black. On sighting many Red Apples, Green Guavas etc, one can then declare that she has seen many non-Black objects, all of which are not Ravens, and hence, also has evidence in support of the hypothesis that All Ravens are Black. This conclusion appears paradoxical, because it implies that we can gain information about Ravens by looking at Apples, Guavas, and other non-Ravens. Is this really true? If not, how can we explain this paradox?

The most popular resolution to this paradox is the probability-based (Bayesian) resolution. As the proposed solution goes: there are immensely/infinitely more non-Ravens, as compared to Ravens. Hence, the sighting of a Black Raven, confers immensely/infinitely more evidence, as compared to a Red Apple. Hence why the paradox, despite initially seeming to contradict our intuition, is actually in line with our gut.

However, this resolution sidesteps the heart of the paradox, and can itself be easily defeated. For example, suppose we modified our claim to the following: All non-Black Birds are non-Ravens, which necessarily implies that All Ravens are Black. The number of Birds is surely greater than the number of Ravens, but only by a finite amount. Suppose Person-A now observes a billion Green Parrots, and Person-B observes 10 Black Ravens. Does Person-A possess more evidence in favor of the claim that All non-Black Birds are non-Ravens, and hence, All Ravens are Black? According to the probability-based resolution, the answer is yes, but we now find ourselves right back where we started. By what reasoning can we claim that observing a billion Green Parrots provides evidence in favor of All Ravens being Black?

The most consistent resolution to the paradox is arguably a straightforward application of Nicod’s criterion, which says that “only observations of Ravens should affect one’s view as to whether all Ravens are Black. Observing more instances of Black Ravens should support the view, observing white or coloured ravens should contradict it, and observations of non-ravens should not have any influence.” Is this really true? How can we claim that observing a billion Green Parrots does not lend any support to the hypothesis that All non-Black Birds are non-Ravens?

In order to better understand this paradox, it helps immensely to view the world from a different perspective. A 21st century perspective.

Because the world is so big and traveling by foot is so hard, suppose we decide to build a swarm of AI-drones to help us find evidence for/against the statement that All Ravens are Black. Unfortunately, training our drones to identify Ravens is too hard. So instead, we train our drones to take pictures of any and all objects based on their primary color. We have one swarm of drones that take pictures of all Black objects, a second swarm of drones that takes pictures of all Green objects, a third swarm for all Blue objects, and so on. We release these drone into the wild, and a month later, they come back with billions of pictures, organized by the object’s color.

There are too many pictures to inspect exhaustively, so we need to pick our battles most efficiently. Person-A decides to ignore all the other colors, and looks through the pictures of Black objects. In it, he finds various pictures of Black Ravens, which is unsurprising since the existence of Black Ravens is already known.

Does Person-A now have any evidence in support of the hypothesis? Of course not. We already know that Black Ravens exist. Person-A looking through the pictures of Black objects, does nothing whatsoever to strengthen or weaken the hypothesis.

Conversely, Person-B looks at a mix of all the non-Black pictures, and manages to inspect a couple million of them. He finds Red Apples, Green Plants, Blue Parrots, and all manner of objects, but no Ravens. Does this provide us with any (non-definitive) evidence in support of the hypothesis?

Of course it does. If a non-Black raven existed, there’s a chance that a drone would have found it and taken a picture of it, and that person-B would have stumbled upon it. The fact that Person-B found only pictures of Apples, Plants etc, but no Ravens, provides us with some evidence in support of the hypothesis.

Our instincts tell us that observing Black Ravens provides evidence supporting the hypothesis, and that observing Green Apples does not provide any evidence of value. However, given a set of experimental procedures described above, we have reached the diametrically opposing conclusion.

It’s important to note however, that this conclusion is utterly procedure dependent. Suppose we use different technology in order to gather our data. Suppose now, our AI drones are capable of recognizing shapes very accurately, but they are incapable of recognizing color. We send out a swarm of drones that take pictures of all Ravens, regardless of color. And for whatever reason, we also send another swarm of drones that take pictures of all non-Ravens, regardless of color. Person-A now looks through all the pictures of Ravens, and finds only Black Ravens. Person-B decides to look through all pictures of non-Ravens, and finds only Red Apples, Green Plants etc, but no Ravens.

In both contexts, the data (and the world itself) remains exactly the same. Person-A is looking at pictures of Black Ravens, and Person-B is looking at pictures of Red Apples. And yet, because we’ve changed the manner in which we gather data, the conclusions are completely flipped. In this new context, Person-A has evidence to support the hypothesis that all Ravens are Black, whereas Person-B has no evidence whatsoever to contribute towards the hypothesis.

An astute reader may note that in both examples above, neither the observance of Black Ravens, nor Red Apples, does anything to support the hypothesis. Rather, it is the absence of non-Black Ravens, that truly supports the hypothesis. Which brings us to our key-point, and true resolution for this paradox: Evidence can only ever be gained through experiments (and analyses) that are most likely to produce results that falsify (or cast doubt on) the hypothesis being tested.

Examining pictures of known-Black Objects, or known-non-Ravens, can never produce any falsifiable result. Hence why these analyses can never provide evidence to support the hypothesis. Rather, examining pictures of known-non-Black Objects, or pictures of known-Ravens, can indeed produce potentially falsifiable results. Hence why these approaches can produce evidence in favor of our hypothesis.

One can easily imagine yet more experiments – for example, a 3rd experiment which sends all drones to take pictures of Ravens in one country alone, and a 4th experiment which divides up the drones amongst many different countries all around the world. Even if they yield identical data, experiments like the latter are much more likely to produce a falsifying result, and hence, the data obtained from those experiments should give us far more confidence in the hypothesis.

As surprising as it sounds, evidence and knowledge comes not from the data itself, but from the combination of data and the manner in which it was collected. Contrary to popular perception, the data cannot simply speak for itself. In order to support a hypothesis, we have to conduct experiments which are most likely to provide falsifiable results. Only the results gleaned from such experiments, hold any weight whatsoever. Examining data derived from non-falsifiable or biased experiments, is nothing more self congratulatory confirmation bias.

The above conclusion is not just intellectual masturbation, but can be practically used to improve the way we do science. If we want to support the hypothesis that All Ravens are Black, simply finding millions of Black Ravens or Green Apples, is not going to support our hypothesis. If someone comes to you and provides millions of such pictures, but refuses to say anything else, you should tell him that this information is absolutely useless. Putting blind faith in data, and ignoring the process by which it was generated, is precisely what leads to unreliable scientific results like the replication crisis, biased industry-funded research, or the popular belief that “you can always find a study that supports what you believe”.

Thankfully, scientists are starting to learn this lesson the hard way. Psychology researchers, bitten hard by the Replication Crisis, are now pushing for their peers to “preregister their research plans before collecting/analyzing data… to specify, in as much detail as they can, their plans for a study … procedures, measures, rules for excluding data, plans for data analysis, predictions/hypotheses, etc… and they post those plans in a time-stamped, locked file … accessible by editors and reviewers”.

Such measures are well overdue, but they shouldn’t simply be optional. They should be an absolute requirement. Not just in Psychology research, but in any and all scientific research. And in an ideal world, all exploration of knowledge. As can be shown from the Raven paradox above, such experimental controls are just as important as the resulting data itself.

Related links:

How clinical studies abuse data