Statistical analysis shows the founder of modern genetics may have falsified data in his famous pea experiment in order to better correspond with his expectations.

(Images: By Aleandro/Shutterstock and By Hugo Iltis – Wellcome Library, London)

Science proceeds by developing models of what we think the world is like and then looking for data that test those models. How? By comparing experimental results with predictions. In many cases, scientific theory is tested by comparing the theory’s experimental results to its prediction.

One famous example of this: the case of Gregor Mendel’s famous experiments with peas.

The Monk’s Model

In the middle of the 19th century, Mendel was a monk conducting many experiments now viewed as seminal with respect to our understanding of heredity: how characteristics of parents are passed on to children. In particular, Mendel had a model in mind where each of the parents had two genes, of which they each contributed one to the offspring plant. Then, those two genes made up the genetic basis for the plant’s actual appearance.

Mendel worked with many different characteristics, one of which was the color of the pod. He believed that he had some plants with two yellow genes. He took these yellow plants, and he crossed them with plants he assumed to have two green genes. (Yellow is the dominant gene, so if you have a plant with a yellow and a green gene, or two yellow genes, the plant will appear yellow.)

This is a transcript from the video series Meaning from Data: Statistics Made Clear. Watch it now, on The Great Courses Plus.

When Mendel crossed plants with both yellow genes (homozygous, meaning both genes are the same) with plants that had both green genes, all the offspring had one of the genes from each parent in its genetic makeup. And all of them appeared yellow – because yellow was the dominant gene.

Four Possible Results

The interesting part of the experiment occurs at the next generation.

Learn more about how biological information is passed from parents to offspring at the level of organisms and their traits

Suppose you take the plants that resulted from the previous breeding (each of which was heterozygous, meaning they had a yellow gene and a green gene) and combined them to form potential offspring. The theory Mendel proposed predicted that each parent would randomly contribute one or the other gene to the daughter plant.

Diagram illustrating the results of two generations of cross-breeding two homozygous pea plants, one green and one yellow (Image: N.Vinoth Narasingam/Shutterstock)

As a result, there are four possible things that could happen in this kind of an experiment.

Both parents could contribute yellow genes. The first parent could contribute the green gene and the second parent the yellow gene. The first parent could contribute the yellow gene and the second parent the green gene. Both parents could contribute green genes.

(For the plants with two green genes, the pod would appear green.)

Learn more about Mendel’s powerful hereditary theory

The Experiment Begins

Here’s how these experiments proceeded.

Mendel crossed a bunch of heterozygous plants and looked at the percentage of offspring plants that were yellow and the percentage that were green.

His expectation: One-quarter of the offspring plants were green, and three-quarters of them were yellow. Among the yellow pods, he expected one-third of these to be homozygous (with two yellow genes) and the other two-thirds to be heterozygous (one green and one yellow gene).

Suppose you did an experiment in which there were 200 plants expected in each of these quadrants. You would expect that in doing this experiment many times—which Mendel did—you would have expectations of this kind of an outcome (if this were the size of the experiment).

Learn more about how Mendel’s work anticipated the modern understanding of genes, chromosomes, and the formation of gametes during meiosis

Heterozygous versus Homozygous

But how did Mendel know whether a plant was heterozygous or homozygous?

Mendel took the yellow plants and bred them with themselves 10 times. By breeding them with themselves, of course (if it was homozygous), he would always get a yellow plant.

However, if he had a heterozygous plant, he reasoned the chances were very good that if in 10 breedings one of the self-breedings would contribute both green genes, the plant would come out green. That would be an indication the plant he started with was a heterozygous plant, that it had a green and yellow gene.

Mendel collected a great deal of data, which all supported his theory. In many instances, with data of this size, he found there were 201 plants he had classified as being homozygous with yellow. But the ratios were all very close.

Learn more about inferring meaning from data

Fact-Checking Mendel

In 1936, Ronald Fisher, English statistician and biologist investigated Mendel’s data (Image: By Unknown autho/Public domain)

In 1936, Ronald Fisher wrote a paper in which he investigated Mendel’s data. In particular, he noted that Mendel’s data were too good to be true.

Remember: When you’re dealing with a random process, you don’t expect the answers to always be exactly according to expectation. You expect a distribution of the answers.

Most of the time, the answers will be within a certain distance of the expectation. But a certain fraction of the time, you’d expect to have outliers. You’d expect to have rare occurrences.

Learn more about using statistical inference to compare data that we collect to expectations about what the data would be

One of the things Ronald Fisher pointed out was that the number of experiments in which Mendel’s data were very close to expectation was too great to be believed.

Faulty Data

Here’s an example of the reasoning Fisher used.

Suppose you flip a coin 1,000 times. You know that, on average, the mean of the distribution of the flips is going to be 500. But you also know that if you actually flip a coin 1,000 times, often the number of heads will be less than 500 or more than 500.

Fisher noticed that Mendel’s data tended to give more outcomes that were within one standard deviation of the mean than would be expected. You expect outcomes of an experiment that involve random chance to lie within one standard deviation of the mean—in a normal distribution, about 68% of the time. But that means that about 32% of the time, you expect the results of that random experiment to have values outside of one standard deviation from the mean.

Yet it turned out that the data reported by Mendel had too high a frequency of being too close to expectation. Fisher argued the data were not properly constructed.

Learn more about the disparity of intuition and randomness

Misclassification and Randomness

Fisher went on to make another claim about the results from Mendel’s data.

The strategy by which Mendel chose to classify a plant as heterozygous was to perform the self-breeding experiment 10 times. There was a chance (a small one), that if you took a heterozygous plant and, by randomness alone, bred it with itself 10 times, every one of those 10 times it would contribute a yellow gene and would be yellow every one of those 10 times.

In fact, it’s not a difficult computation to see exactly what proportion of the times that would happen. Namely, when you cross-breed a heterozygous plant, the chances are three out of four that, of the two genes contributed, at least one will be yellow. When you cross a heterozygous plant with itself, there’s a 3/4 chance it will be yellow. If you do it 10 times, there’s a 3/410 chance it will be yellow every single time, which is 5.6%.

So the probability of misclassifying a heterozygous plant is 5.6%. What that means is that Mendel could have misclassified 22 plants as homozygous that actually were heterozygous.

Learn more about hypothesis testing whether observed data are consistent with a claim about the population in order to determine whether the claim might be false

So, the actual expectation from the experiment should have been 222.5, not 200, which was the actual expected outcome for the plants that are really homozygous.

Too Good to Be True

The effect of this is when Mendel reported an experiment—and this is a specific example of one experiment of many—in which the reported number, 201, is very close to the quasi-expectation of 200. But you see that it’s rather distant from what should have been expected, including the falsely classified heterozygous plants that should have been falsely classified as homozygous.

This is the kind of reasoning Fisher used in his article to show that the data Mendel got were too good. In fact, in this case, Mendel wasn’t subtle enough to realize he should have been expecting 222.5 instead of 200 in that box.

So, Mendel’s data came out more like the 200 than what he really should have found.

Common Questions About Gregor Mendel Data Falsification

Q: What exactly did Gregor Mendel discover? Gregor Mendel discovered genetic inheritance by observing the traits of pea plants over 8 years. He found that genes are paired and are discretely inherited from each parent. Q: Why did Mendel select pea plants for his experiment? Pea plants proved ideal for Mendel’s experiment for the following reasons: they are simple to raise, they have traits which are easily observable, and they have a brief lifespan, which means that multiple generations can be observed. Q: Are Mendel’s laws valid? Mendel’s laws are valid when it comes to peas, humans, and all other sexually-reproducing creatures. However, they do not fully account for all the complexities of genetic inheritance patterns. Q: How did Mendel process his data? Mendel selected peas with seven traits and two categories for each trait — for instance, shape (round or wrinkled), color (green or yellow), etc. He then bred these peas and took note of the results over the years.

This article was updated on 9/28/2019