Trevor Bedford and 55 colleagues just published a preprint detailing some detective work on early transmission of Covid in Washington state.

The TL;DR is they looked at 346 viral genomes from infected patients and said the data are

strongly suggesting cryptic spread of COVID-19during the months of January and February 2020, before active community surveillance was implemented.

How they can know

Bedford describes the way this detective work is done in a blog post but the gist is that viruses mutate all the time (often in ways that don’t effect function). Given an estimate of how often these mutations occur, we can look at two nearby cases, see how many mutations have occurred between them and use this as a “molecular clock” to tell us how long ago the two versions of the virus had an ancestor in common. The evolutionary tree of a virus plus its mutation rate allows us to infer timing.

The figure below depicts the estimated evolutionary tree of viral samples, my annotation in blue.

What they found among their samples

The first cases

Quoting from the paper

All SARS-CoV-2 genomes represented in convenience samples from the COVID-19 pandemic appear closely genetically related with the large majority possessing between 0 and 12 mutations relative to a common ancestor estimated to exist in Wuhan between late Nov and early December 2019

Looking at 224 viral samples in Washington most closely related to the first American case there, they try and evaluate two competing hypotheses

SARS-CoV-2 was introduced into Washington State on 15 January 2020 with the arrival of WA1; subsequent cryptic transmission led to a community outbreak first detected on 28 February 2020

SARS-CoV-2 was imported on 15 January 2020 but this infection did not transmit onwards; a second, initially undetected importation event of a genetically identical or highly similar virus occurred, followed by cryptic transmission that led to a community outbreak

They caution that testing these hypotheses is very difficult but give tentative estimate of a 3% chance of observing the given data if the second hypothesis were true.

Other clusters

They also find that around 11% of the viruses, 38 total, in their data are

closely related to viruses from the European outbreak and likely represents a second introduction occurring at sometime in February 2020

More testing and data are needed to do similar analyses for other areas.