Today the genome of the loblolly pine was published in Genome Biology – the largest yet sequenced. This paper is mostly important because the authors made real improvements to the process that scientists use to sequence large and complex genomes like that of the loblolly pine. Because let’s face it, they’re not likely to hold the record for long. Genome sequencing technologies are moving fast and there are hundreds of sequencing initiatives going on.

So, a little defining of terms. Sequencing is when you work out the exact code of DNA bases A,C,G and T that make up a genome. But you can estimate the number of bases in a genome without knowing what they are, so we have lots of information about the size of various genomes without knowing exactly what’s in them. Like knowing how many pages are in a book, without having read exactly what letters are on each page. These are measured in base pairs, as DNA is double helix, so the bases are always in pairs.

Our infographic comparing the largest to smallest known genomes. Where do we stand? Click the image to see a close up.

Just finding the figures to put together this infographic of some of the most interesting genomes out there was a challenge – it’s changing all the time! We looked for the ‘smallest’ genome listed on a table on Wikipedia, only to find it had been supplanted by the discovery of an even smaller genome. And the largest genome of any organism is contested – the 640,000,000,000 base pairs of the tiny amoeba Polychaos dubiu is contested, because it’s size was estimated before modern techniques were developed, so it might be wrong! So the most likely candidate for biggest genome is actually Paris japonica.

Plants often have huge, complex genomes. This is sometimes because their genomes spontaneously double, so that instead of being in pairs (A diploid organism), their chromosomes are in group of 4 or more – these are called polyploid, and they tend to have enormous genomes. The amazing thing about the loblolly pine, which is currently the largest genome sequenced, at 22.18 billion base pairs, is that it’s actually a diploid, so it’s size and complexity is nothing to do with chromosome doubling. Sequencing the genome revealed that actually, a lot of its bulk is down to repetitive bits of sequence.

While the human genome has had a huge effect on medical treatment and research, and was a huge step forward in sequencing technology, my personal favourite of all of these is the man-made bacterial genome created at the J. Craig Venter Institute in 2010. It was based on the Mycoplasma mycoides genome and is affectionately known as Mycoplasma mycoides JCVI-syn1.0. It is estimated that this synthetic genome cost US$40 million to make and took 20 people more than a decade of work. It was an amazing proof of principle that you can synthesise the genome of an organism and make it work within a living body. For bacteria at least.

The ethical and societal implications of this are huge, and so this was a really controversial development – something the Institute is well known for! Their current work includes amazing things like the Human Microbiome Project, and synthetic bacteria to tackle carbon levels. So they are just one place that is moving beyond what’s in the genome, to what we can do with it.

Genome sequencing is amazing, because it opens up so many new questions and possibilities for science. So while the records in this infographic – though accurate at the time of writing, are sure to be supplanted, we can be sure of one thing – the genome is just the start of the story.