Newly discovered handwritten documentation sheds new light on an ongoing scientific controversy regarding a famous collection of nearly 1,000 skulls amassed by a 19th-century Philadelphia physician. Dubbed the "American Golgotha," the collection is the work of Samuel Morton, who used them to compare the brain size of different racial groups in the 1830s and 1840s.

Paul Wolff Mitchell, a graduate student in anthropology at the University of Pennsylvania, where the collection is stored, believes his analysis could help settle the often acrimonious debate over whether the late paleontologist Stephen Jay Gould was correct in his assessment of the role of unconscious bias in science, particularly with regard to race. Mitchell concludes that Gould incorrectly accused Dr. Morton of inaccurately measuring the cranial capacity of his skulls but was nonetheless correct with regard to Morton's implicit racial bias. Mitchell's findings have just been published in PLOS Biology.

An American Golgotha

Morton is widely considered the father of scientific racism, and his controversial ideas about the intellectual superiority of the Caucasian people provided a handy defense of the continued enslavement of African-Americans in the US just prior to the Civil War. He bolstered those views with a broad analysis of 1,000 skulls he collected from all over, sometimes even scavenged from battlefields and the occasional catacomb. At the time, it was widely believed that skull size, or cranial capacity, was a marker of superior intelligence and advanced cognition.

Modern genetics has shown that there is no scientific basis for the traditional concept of race; it's a meaningless designation. But Morton ascribed to an archaic worldview that held that there were five distinct races, each representing separate acts of creation, and thus falling into a divinely determined hierarchy. In descending order, they were Caucasians, East Asians (Mongolian), Southeast Asians, Native Americans, and Blacks (or "Ethiopians") at the bottom. He naturally would have expected his measurements of cranial capacity to fall neatly within that hierarchy.

Morton made his first measurements of 256 skulls in 1839. To determine cranial capacity, he stuffed each skull with pepper seeds to determine the volume and meticulously jotted down the results. He published those results in Crania Americana the same year. For a second round of measurements of 672 skulls, published in 1840, he switched to lead shot, since he found it difficult to adequately replicate measurements with the pepper seeds, likely because they were so easily compressed. In both cases, he concluded that Caucasian skulls had the largest cranial capacity and that African skulls had the smallest. His third and final catalog appeared in 1849, two years before his death.

Morton's ideas fell out of favor over the ensuing century, but he became a lighting rod for renewed controversy in the 1970s, thanks to the late Stephen Jay Gould. Gould was convinced that the scientific method could be tainted by the personal biases of an individual scientist, particularly when it came to pre-existing racial biases about IQ, for example. He used Morton's measurements of his skull collection as a case study in a 1978 paper and in a chapter in his 1981 book, The Mismeasure of Man.

A “patchwork of fudging”

In the latter, Gould dismissed Morton's measurements as "a patchwork of fudging and finagling in the clear interest of controlling a priori convictions," although he did not think it was deliberate. Rather, Morton had an unconscious racial bias. He associated cranial capacity with intelligence, expected the Caucasian skulls to be larger, and so he systemically under-estimated the capacity of the African skulls. His preconceptions influenced his analysis.

Part of Gould's argument centered on a discrepancy between the 1839 skull measurements and Morton's 1840 data, showing a significant difference between the African skulls and the others. Granted, Morton had switched from pepper seed to lead shot for the later measurements, but the discrepancies should thus have occurred across the board for all the skulls. To Gould, this was a clear indicator that Morton had doctored the data unintentionally, perhaps by overfilling the Caucasian skulls with pepper seed or under-filling the African skulls.

Gould’s takedown cemented Morton as the poster child for 19th-century scientific racism.

Gould's takedown cemented Morton as the poster child for 19th century scientific racism. But some scientists took issue with Gould's conclusions, arguing that he was misrepresenting Morton's data and had never bothered to do his own measurements of the skulls to verify his assertions. The first to raise doubts was an undergraduate at Macalester College in St. Paul, Minnesota, named John Michael. His 1988 paper asserting that Morton's measurements were reasonably accurate, however, was largely dismissed. Gould was just too prominent for an undergraduate's critique to hold much water. Gould made no mention of it in the revised 1996 version of Mismeasure of Man.

Controversy fully erupted with the publication of a 2011 paper by a team of anthropologists led by Jason Lewis. They had replicated Morton's measurements of skull cranial capacity using lead shot on half the skulls in the collection, and their results matched Morton's in all but two percent of the cases—a statistically insignificant degree. "Ironically, Gould's own analysis of Morton is likely the stronger example of a bias influencing results," Lewis and his colleagues wrote in their paper. Co-author Ralph Holloway went so far as to denounce Gould as a "charlatan" to The New York Times—a rather shocking breach of academic decorum.

But Lewis et al. did not really address the central question of the differences between the seed and lead shot measurements, according to Mitchell, although they rightly identified places where Gould was a bit sloppy in his analysis. A concurrent Nature editorial noted as much, advising caution with regard to the conclusions, since "the critique leaves the majority of Gould's work unscathed." The editorial also questioned the motivation of Lewis and his colleagues, several of whom were associated with the University of Pennsylvania and therefore had a vested interest in defending Morton's science as being free from bias. And neither Gould nor Lewis and his colleagues had access to the full set of Morton's original seed measurements.

A new discovery

That's where things stood when Mitchell stumbled across a few key pages of Morton's personal 1840 copy of the Catalog of Skulls. He noticed that many entries had accompanying handwritten measurements jotted down, and those measurements were different from the later 1849 version of the catalogue. He concluded the handwritten notes in the earlier edition were from Morton's initial seed measurements. And he found that any differences between the averages for the seed and lead-shot measurements could be attributed to different overall sample sizes.

So Lewis et al. were correct in that aspect of their 2011 analysis: the differences between Morton's 1839 and 1840 measurements were still very much within statistical norms. Gould was wrong on that particular detail. But Mitchell goes on to make a convincing case in the second part of his paper that Gould was nonetheless correct about unconscious bias. "Just because Morton's data were not biased doesn't mean his science wasn't," he says.

Mitchell compared Morton's analysis to a similar skull survey undertaken a few years earlier by German anatomist Friedrich Tiedemann using millet seed to fill the skulls. Tiedemann's work is not nearly as well known, but Mitchell found his measurements of cranial capacity produced an equivalent data set to Morton's. However, Tiedemann came to a very different conclusion: he insisted that his findings proved it was impossible to use that data to draw any conclusions about racial hierarchies or superiority.

The difference arises from the two approaches each man took when analyzing the data. Tiedemann presented his data as a range in each racial category. All those ranges overlapped with each other far too significantly to make any reasonable scientific pronouncement about race. Morton, on the other hand, took an average of the measurements of the groups. Intriguingly, when Mitchell applied Morton's method to Tiedemann's data, taking the averages, he wound up with the same conclusions as Morton.

In other words, "The data were mute on these questions," says Mitchell. "Had Morton had Tiedemann's data or Tiedemann had Morton's data, they could have produced the exact same conclusions that each respectively did. That fundamental fact needs to be brought to bear on how we think about what bias means in cranial race science in the 1830s and 1840s."

Morton's belief in the racial superiority of Caucasians influenced his interpretation of his data; Tiedemann's staunch anti-slavery views did the same for the analysis of his data. And that's where Lewis et al. erred in their 2011 paper: they declared that the Morton case, rather than showing the ubiquity of bias, instead showed how science can escape the "bounds and blunders of cultural contexts." This is clearly not the case, based on Mitchell's analysis. Both Morton and Tiedemann had amassed good data, but that was not sufficient to save them from their own biases when it came to interpreting that data.

“There’s no basis whatsoever to hold Morton up as a scientist we should look back on with reverence or respect.”

According to Eric Michael Johnson, an evolutionary anthropologist and historian of science based in Vancouver, BC, Gould was also correct in his original 1978 critique that skull size is irrelevant to determining intelligence or any kind of racial characteristics. Morton knew nothing about the bodies those skulls originally belonged to. "He didn't even know the sex of the skulls in most cases, and so would often arbitrarily assign a sex based on the size of the skull," Johnson says. "He could have gathered an average of these racial groups and put them into basic categories of small, medium, or large and then adjusted the skull averages within that framework. He would have found that it didn't fit this clear hierarchy that he believed in."

Johnson thinks Mitchell makes a solid argument in his new paper but acknowledges that "it brings out a lot of really interesting details that I think will be interpreted in contradictory ways, depending on the reader." Perhaps it could provide ammunition for those wishing to reignite the controversy. That said, "There's no basis whatsoever to hold Morton up as a scientist we should look back on with reverence or respect," says Johnson. "He measured things accurately, but his conclusions were fundamentally wrong and incredibly biased even by the standards of that time."

DOI: PLOS Biology, 2018. 10.1371/journal.pbio.2007008 (About DOIs).