The number of reported Ebola cases is doubling roughly every five weeks in Sierra Leone, and in as little as two to three weeks in Liberia.

The number of reported cases globally is projected to reach 10,000 by the end of October. The actual number of cases may be twice the official figure. So how are such figures estimated – and what can bioinformatics do to help control the disease?

The 2014 Ebola outbreak in West Africa appeared suddenly and spread rapidly, and is thought to have started with a single animal-to-human transfer in December last year. It’s an example of an emerging infectious disease (EID): one that has newly appeared in a population or has undergone a rapid increase in incidence. SARS and various strains of avian influenza are examples of EIDs.

EIDs are often zoonoses – animal diseases that have infected humans as hosts and become transmissible. Such “host-switching” events can happen anywhere at any time, and preparedness to respond rapidly and effectively when this occurs is an important aspect of public health policy.

One parameter that epidemiologists use to quantify the rate of a disease’s spread is the basic reproduction number: R0 (R-nought).

This is the number of new cases generated on average by each infected individual, in idealised conditions. Diseases with R0 less than 1 are not likely to become epidemics, but those with R0 more than 1 have the potential to spread exponentially.

Current estimates for Ebola indicate an R0 of around 2 – higher than the R0 of some strains of influenza – although it varies between regions.

Other parameters that determine the spread dynamics of a disease include the length of time the disease takes to incubate, and the period of time during which diseased individuals are infectious.

A key parameter is the proportion of cases that are identified. Many cases, including some that result in death, are not reported, either because victims do not seek medical care, or because overwhelmed medical personnel might fail to accurately record all interventions.

This is important not only because under-reporting reduces the effectiveness of management strategies, but also because it can influence estimates of the other parameters mentioned above, particularly if there is variation in reporting levels across regions.

Attempts have been made by the Centre for Disease Control to estimate the degree of under-reporting for Ebola, but these are currently not very accurate. The World Health Organization (WHO) estimates that the actual number of cases in Guinea is about 1.5 times the reported figure, with corresponding factors of 2 for Sierra Leone and 2.5 for Liberia.

(The WHO hasn’t published the methodology used to estimate these figures: they may be little more than guesses.)

Putting the pieces together

Bioinformatics plays a key role in detecting, monitoring and responding to EIDs.

In the case of Ebola, the bioinformatics community has responded rapidly. For example, the current outbreak of Ebola in Sierra Leone was first detected in May, but by September a study reported sequencing 99 Ebola virus genomes from 78 patients diagnosed with the disease between late May and mid-June.

The process of sequencing a genome involves assembling many thousands of short sequences – gene fragments obtained from all over the genome. Algorithms for assembling genome sequences detect overlaps between fragments, and align and merge them to reconstruct the sequence of the whole genome.

Bioinformaticians have been developing and refining algorithms for sequence assembly since the late 1980s, and are constantly adapting them so they can handle new sequencing technologies and ever-larger scales of assembly.

By the end of September this year, the UC Santa Cruz Genomics Institute had released a new Ebola genome browser with an alignment of 148 individual viral genomes, including 102 from the current outbreak. This was a monumental effort: UCSC researcher Jim Kent led a team that reportedly worked around the clock in the last week of September to produce the browser.

Such genome browsers will undoubtedly accelerate global efforts to develop a vaccine and antiserum.

One of the advantages of having whole genome sequences is that they can be used to reconstruct the family tree (phyolgeny) of Ebola viruses, and trace the course of the outbreak. Reconstructions of this kind can provide important insights into the successes and failures of current management strategies.

They can also be used to estimate parameters that govern how rapidly the virus spreads, in terms of both number of cases and geographic range.

These can, in turn, be used to forecast the future course of the epidemic and predict the impact of various management strategies.

There are many other ways in which bioinformatics contributes to the management of EIDs. Genomic sequence analyses can lead to a better understanding of the biology of a disease, the features that make it pathogenic, and potential drug targets or clinical interventions.

A recent survey of Australian life sciences conducted by Bioinformatics Resource Australia EMBL (BRAEMBL) found that bioinformatics is seen by many laboratory scientists as core to their work, but also identified marked community concern about a lack of expertise and access to expertise in bioinformatics.

In light of the importance of bioinformatics in managing EIDs, and its growing role in facilitating research in the life sciences more generally, it is important that students and early career researchers from mathematics, statistics, computer science and biology are attracted into this field, and receive world-class training in its practice and implementation.

Bioinformatics is, and will continue to be, a core component of the international response to Ebola and other EIDs, and patients, medical staff and those close to them need all the help they can get.

The annual BioInfoSummer conference and training workshop is an initiative of the Australian Mathematical Sciences Institute. This year the event is being hosted by Monash University on 1-5 December.