A problem of technique

Most of the problems so far weren't really experimental ones; rather, they were problems with interpretation. It's only when the team went after sequences from the genome that things got a bit strange. A few of their samples appeared to have sufficient DNA to send them for sequencing on one of the current high-throughput sequencing platforms. The quality score assigned to the sequencing runs was good, meaning that they had lots of DNA sequence data to assemble into a genome (although, oddly, the team interpreted this to mean that the sample came from a single individual, which it does not).

The challenge is that the high-throughput machines typically produce short sequences that are about 100 bases long. Even the smallest human chromosome is over 40 million bases long. There are programs that are able to recognize when two of these 100 base-long fragments partly overlap and combine their sequences to create a longer sequence (say 150 bases). By searching for further partial overlaps, the programs can gradually build up longer and longer stretches, sometimes ranging into the millions of base pairs. Although this software will still leave gaps where sequences don't exist or show up at multiple places in the genome, it's still the standard way of assembling genomes from short, 100-base-long reads.

For some unfathomable reason, team bigfoot didn't use it. Instead, they took a single human chromosome and got some software to line up as much as it could to that.

There are a number of serious problems with this approach. You could have an entirely different genome present in the sequences, and the software would ignore most of it. Most of the gene coding regions are highly conserved among mammals, so they'd line up nicely against the human chromosome—in fact, they might be difficult to distinguish from it. But the entire rest of the genome would be ignored by the software. By taking this approach, the authors pretty much guaranteed they'd get something out that looked a lot like a human genome.

The other problem here is that the software will typically treat the human chromosomal sequence as a target that it attempts to recreate. If it can't find a good match, it will stick the best match available where it's needed. Sometimes, the match will be fairly good. Other times, the sequence will be barely related to the template it's supposed to match.

Even given all these advantages, the software still couldn't assemble an entire chromosome. Instead, it ended up matching sequences to three different stretches of the chromosome, each a few hundred thousand base pairs long. Remember, the human genome is over three billion base pairs total. This only represents a tiny fraction of it. Given that the quality score provided for the DNA sequencing run was high, this tells us one of two things: either the software was woefully incapable of assembling a genome, even when given a template; or there was very little human DNA there in the first place. As we'll see, it might be a little bit of both.

A hypothetical hybrid

At this point, it's worth stepping back to try to figure out what it would look like if the author's ideas were correct, and some humans interbred with an unidentified hominin species to produce what are now bigfeet. There are two groups that humans are known to have interbred with: Neanderthals and Denisovans. But, obviously, anything that would have given us a bigfoot must have been quite different from the Neanderthals and Denisovans, which largely looked human. So, we can probably assume that it had diverged from our lineage for longer, but not as long as chimps.

What would the genome of such a hominin look like? Well, for Neanderthals and Denisovans, the genomes mostly look human. If there's a difference between humans and chimps, in most cases, these other groups have the human sequence. Hominin X's genome would be more distantly related. But the chimp genome puts a very strict limit on how different it could be. In terms of large-scale structure, the chimp and human are almost identical; there are only six locations with a major structural difference between the two with a total of 11 breakpoints. Unless you happen to be looking at one of those, you'd typically see the same genes in the same order. None of the breakpoints happens to be on Chromosome 11, which is what the authors were looking at, so this is a non-issue.

Smaller scale insertions and deletions are more common but not that common. Even when you consider them, the human-chimp sequence identity is over 95 percent. If you only focus on the areas of the genome where things line up without major rearrangements, then the identity is 99 percent. So any hominin that we can interbreed with would have a genome that is almost certainly in the area of 97-98 percent identical to our own. Sequences that lined up would be even higher than that.

“One thing I'm sure of is we've proven they exist. We should have been able to do it with just human mito with non-human hair, thoroughly washed and done by two labs.”

The first generation of hybrids would have a 50/50 split between these two nearly identical genomes, after which they'd start randomly assorting. Some areas would undoubtedly be favored or disfavored by various forms of natural selection. But about 90 percent of the human genome doesn't seem to be under any selective pressure at all, and most of the remainder of the genome wouldn't be under selective pressure simply because it's identical in the two species. As a result, all but one or two percent of the genome would probably be inherited randomly from one or both of the two species.

Of course, after the first generation, the two genomes would start undergoing recombination, scrambling them at a finer scale. The probability of recombination roughly scales with the length of DNA you have. The basic measure of recombination, the Centimorgan, represents a one percent probability that there would be a recombination each generation. In humans, a Centimorgan is about a million base pairs. So, if you had 50 million base pairs of DNA, then you'd have even odds that a recombination would take place every generation. In humans, the generation time averages out to be about 29 years; in chimps, it's 25. We'll assume bigfeet are in the neighborhood of 27 years per generation.

If bigfeet got started more recently than 13,000 years ago (based on the Spanish mitochondrial DNA, as mentioned above), that means there have been approximately 481 generations since. In half of these, there would be a recombination within our 50 million base pairs, meaning 241 recombinations. That means, on average, we'd see a recombination every 200,000 base pairs or so.

With that, we know what our genome should look like. Stretches of DNA, over 100,000 bases long, that is human, alternating with equally long stretches of something that looks almost human but not quite. In fact, the identity between the two sequences should be strong enough that it would be difficult to say where one ended and the next started with any greater resolution than about 1,000 base pairs. And because there were apparently a number of distinct interbreeding events (again, based on the mitochondrial DNA), then no two big feet are likely to have the same combinations of human and nonhuman stretches.

You call that a genome?

This is, of course, nothing at all like what the genome that's been published looks like. The paper itself indicates that regions of clearly human DNA are typically only a few hundred base pairs long. And interspersed with those are equally short pieces of DNA that appear to look little to nothing like the stretch of the human genome that they're supposed to be aligned to. If the genome is viewed as a test of the hybrid hypothesis, then the hypothesis fails. When asked about this, Ketchum just returned to the mitochondrial data. "I know there are ways, like you said, to figure out the nuclear age of things, but the bottom line is it couldn't have been longer than 13,000 years ago."

What actually is this? To find out, I started with the ENSEMBL genome website, which provides a convenient view of a variety of animal genomes. I then selected a large region (about 10,000 bases) from the purported bigfoot genome and used software called BLAST to align it against the human genome. The best match was invariably chromosome 11, which made sense, because that's what the authors used to build their sequence. And as described in the paper, the sequence was a mix of perfect matches to the human sequence along with intervening sequences that the software indicated didn't match.

I then selected each of the intervening sequences that were over 100 base-pairs-long and used the BLAST software hosted by the National Institutes of Health at NCBI. This would test the sequence against any genome that we've tried to sequence, even if the project wasn't complete.

If the hybrid model was correct, and these sequences were derived from another homonin, then they should look largely human. But for the first 10,000, most of them failed to match anything in the databases, even though the search's settings would allow some mismatch. Other sequences came from different locations in the human genome; another matched the giant panda genome (and presumably represents contamination by a bear). Similar things happened in the next 10,000, with a mix of human sequences, one that matched to mice and rats, and then a handful of sequences with no match to anything whatsoever. And so it went for another 24,000 bases before I gave up.

Ketchum's team had done the same and found similar results. "We had one weird sequence that we blasted in the genome BLAST, and we got closest to polar bear of all things," she told Ars. "And then we'd turn around and blast [unclear] and get 70 percent rhesus monkey with a bunch of SNPs [single base changes] out. Just weird, weird stuff."

Clearly, the DNA that was sequenced came from a mix of sources, some human, some from other animals you might find in the North American woodlands. (Recently, a researcher who was given a sample of the DNA by Ketchum announced that it was a mix of "opossum and other species," consistent with this analysis.) Clearly, there was human DNA present, but it was either degraded or present in relatively low amounts.

When asked to align this sequence to a human chromosome, the software did the best that it could by picking out the human sequences when and where they were available. When they weren't, it filled the gaps with whatever it could—sometimes human, sometimes not.

A question of motivation

In science, it's usually best to start with the evidence. But when the vast majority of the evidence points to one conclusion, and someone insists on reaching a different one, then it can be worth stepping back and trying to understand what might motivate them to do so. In Ketchum's case, the motivations weren't hard to discern; she offered them up without being prompted, even when the discussion was focused on the science.

This was clearest when Ketchum suggested that North America's bigfeet could have European mitochondrial DNA because interbreeding took place there, after which the hybrids crossed Siberia and into Alaska. As noted above, this seemed possible to her because "They're very fast." What wasn't noted above is that she followed that up with, "I've seen them, that's why I can say that." This was followed by a pretty detailed description of how this came about.

There's groups of people called habituators. They have them living around their property. And they interact with them, but they're highly secretive because one, people think they're crazy when they say they interact with bigfoot—and I prefer Sasquatch by the way, but bigfoot's easier to say. Finally a group of them came by and said "you want to see 'em? we'll take you and show you." And they did. The clan I was around was used to people and they were just very, very easy to be around—they're real curious about us, and they'd come and look at us, and we'd look at them.

With that experience and others that followed (several of which she described), Ketchum says she switched from skepticism to a desire to protect what she had seen. Several groups, including Spike TV, have offered rewards for anyone who could shoot a bigfoot, something Ketchum genuinely seems to be horrified by. "They are a type of human and we want them protected," Ketchum told Ars. "That's been the whole point of this once we realized what we had. And I've known what we had for several years now. Within the first year, we knew that we had them, it was just a matter of accumulating enough proof to satisfy science."

In terms of knowing what she had, Ketchum returned to the forensic evidence, which showed human mitochondrial DNA in a hair sample that had been identified as non-human. "One thing I'm sure of is we've proven they exist. We should have been able to do it with just human mito with non-human hair, thoroughly washed and done by two labs." At a different point, she said, "All we wanted to do with the paper was to prove there was something novel out there that was basically Homo, and the mitochondrial DNA placed it clearly in Homo."

With that clearly established, all the apparently contradictory results simply become points of confusion. When asked about the discrepancy between the young mitochondrial age and the nuclear genome, Ketchum just said it was a mystery. Referring to the apparent age difference, she said, "It would look that way but it's not, that's the problem. I don't know how to rectify that other than they are what they are, and the data is what it is." Later, she suggested that the creatures might simply experience an extremely high rate of mutation.

Ultimately, she saw the collection of contradictions as a sign of her own sincerity. "I'm not sure why they're like they are. I don't think anybody is, and I think that gives people a real problem. But we can't change how the results came out. And I'm not going to lie about them, and I'm not going to try to make them fit a scientific model when it doesn't."

After an hour-long phone conversation, there was no question about whether Ketchum is sincere in her belief that bigfoot exists and if her data conclusively proves that it's worthy of protection. But, at the same time, it's almost certainly this same sincerity that drove her to look past the clear problems with her proof.