A Heliconius butterfly.Credit: Tim Zurowski/Shutterstock

When evolutionary biologist Nick Grishin wanted to tackle big questions in evolution — why some branches of the tree of life are so diverse, for instance — his team set out to sequence the genomes of as many butterflies as it could: 845 of them, to be precise.

In a study that some researchers are hailing as a landmark in genomics, Grishin’s group at the University of Texas Southwestern Medical Center in Dallas sequenced and analysed the genome of what it called a “complete butterfly continent”: every species of the creature in the United States and Canada. The study was posted on the bioRxiv preprint server on 4 November1.

“I think its bloody amazing, because the technology involved in sequencing 845 species is there,” says James Mallet, an evolutionary biologist at Harvard University in Cambridge, Massachusetts. “It’s a beautiful piece of work, a tour de force, to do all that.”

Does evolutionary theory need a rethink?

The data allowed Grishin’s team to build an evolutionary tree detailing the relationships of all the butterflies, as well as to determine the pace at which new species formed. The team suggests that fast-diversifying groups of butterflies are those that swap genes with close relatives through interbreeding — a phenomenon that could extend to other organisms.

Others, however, have pointed out that that most of these genomes will be of limited use to other researchers, because they are low-quality ‘drafts’ comprised of thousands of short DNA stretches, and not higher-quality genome sequences that have been assembled into longer stretches. Grishin says that the sheer number of genomes, even of low quality, allows his team to draw broad conclusions about evolution that could not be made from more limited data sets. He plans to make the genomes publicly available when the study is published in a peer-reviewed journal.

Butterfly patterns

Grishin, whose research group studies the shape and evolution of proteins, started researching butterflies after reading a 2012 paper on the diverse tropical genus Heliconius, whose species have elaborate wing patterns that mimic those of other butterflies2. The study found that some genes that determine wing patterns seemed to have been passed between three Heliconius species through interbreeding, instead of being inherited from the species’ common ancestor, and suggested that such swaps explain the huge diversity of Heliconius butterflies.

Inspired by that work, Grishin wondered whether such a connection could be seen in other butterflies. “Some groups diversify very rapidly and there are many species in them, and others are kind of empty,” he says. “So to understand why and how that happens, we would need to sequence them all.”

At one time, sequencing hundreds of butterfly genomes would have been unaffordable, but costs have plummeted in recent years. Collecting samples for every species in the United States and Canada was still a challenge, however. Grishin’s team worked with amateur butterfly enthusiasts as well as museum collections across the United States to gather data — a single leg from a dead specimen was enough to obtain a draft-quality genome. Instead of flying to conferences, Grishin and his colleagues took road trips and collected butterflies along the way.

The trickiest family tree in biology

Once they had sequenced the genomes of all 845 species, the researchers used the data to work out the evolutionary relationships between North American butterflies. Their butterfly family tree broadly agreed with existing ones based on anatomy and more limited genetic analyses, although the group did reclassify 40 species and suggested several new groupings at the genus level.

The tree also revealed that some groups of butterflies have evolved faster than others. Two of the fastest-evolving groups, commonly known as the blues and the whites, have developed highly specialized interactions with other organisms that might explain their rapid evolution, say Grishin’s team. The blues, or Polyommatinae, form symbiotic relationships with ants, whereas whites, or Pierini, have developed adaptations to feed on mustard plants that are toxic to many other insects.

An analysis of genes shared by multiple species also showed that these diverse groups were more likely to have acquired genes through interbreeding between species, rather than from a distant ancestor. Many of the genes that are swapped between species are thought to be involved in mate recognition and other factors that can cause species splits. Grishin says that by spreading such genes, interbreeding — rather than the gradual accrual of new mutations — could be helping to drive the evolution of butterfly species.

The link between interbreeding and speciation is “an idea that is sort of coming to the fore”, says Mallet, who co-led a team that reported similar findings in Heliconius butterflies this month3.

Missing data

Chris Jiggins, an evolutionary biologist at the University of Cambridge, UK, who also studies Heliconius butterflies, is impressed that Grishin’s team was able to source so many specimens. But he says that draft genomes will be useful only for reconstructing evolutionary relationships, and not for more detailed studies of specific genes. “These cannot be used as a tool to search for genes or gene families, as it will always be unclear what is missing from the sequence data,” says Jiggins.

In addition to the draft genomes, Grishin’s team generated ‘reference’ genomes, in which genes are assembled into chromosome sequences, for 23 species from across the butterfly family tree.

High-quality genomes such as this are the targets of other large-scale projects to sequence the tree of life. In 2018, a consortium called the Earth BioGenome Project laid out plans to decode the genomes of the roughly 1.5 million known species of animals, plants, protozoans and fungi — known collectively as eukaryotes — at an estimated cost of US$4.7 billion over 10 years. In its first 3-year phase — estimated to cost $600 million — the project hopes to generate reference genomes for 9,000 species that represent all the families of eukaryotes.

Grishin is enthusiastic about these efforts, particularly for vertebrates. But he thinks there are simply too many unknown species of invertebrate, including butterflies, to collect and sequence them all in the near future. His next focus is on sequencing the genomes of the roughly 3,500 known species of skipper butterflies found worldwide, to see how such a widespread group evolved.

“These very big, very global projects, although they sound very good, I don't think they will succeed very quickly,” Grishin says. “Our efforts — where we just jump right in and do things right away without much fuss about it — may be helpful.”