Data Mining to Predict, Track & Contain the Ebola Outbreak Written by Pradeep on September 16, 2014

One of the most diverse continents on earth, Africa astounds the world with its vast savannas and great deserts and with its ancient architecture and modern cities, but Africa also has its share of tragedies and woes.

First identified in Democratic Republic of Congo’s Ebola River in 1976, Ebola Hemorrhagic Fever, a deadly zoonotic disease caused by Ebola virus, has been spreading in West Africa like a wildfire, engulfing everything on its way and creating widespread panic.

What has added insult to injury is the fact that the region has long endured the severe consequences of civil wars and social conflicts, and diseases like malaria, HIV/AIDS, yellow fever, cholera etc. have remained endemic to the region for a long time, causing tens of thousands of deaths every year.

Reportedly, Ebola has already killed at least 2,296 people, and there are about 3,685 confirmed cases of infection. Mortality rate has been swinging between 50% to 90%, depending on the quality of care and nutrition. According to WHO, the disease is likely to infect as much as 20,000 people before it is finally brought under control.

The Affected Areas, Source: Al Jazeera

Crisis of Data

When it comes to healthcare management, clinical data is one of the key components. The value of data becomes more urgent in the emergency situation like that of West Africa. The more relevant data you have, the bigger picture you can create for taking aggressive measures. To use Peter Drucker’s words, “What gets measured gets managed.”

Factual data is a precondition for the doctors and health science experts working in the field for measuring and managing the situation. Data helps them to assess their successes or failures and reorient their actions. One of the important reasons why the fight against the Ebola outbreak is turning out into a losing battle is the insufficiency of data. Recently, Scientific American magazine wrote:

Right now, there are not even enough beds for sick patients nor enough data coming in to help track cases. Surveillance and tracking of those who were possibly exposed to Ebola remain inadequate.

In Science magazine, Gretchen Vogel suggests that the death toll of Ebola patients could be much higher than it is currently estimated. She says, “Exactly how many unrecorded Ebola deaths have occurred will never be known. Health officials are keeping track of suspected and probable cases, many of which are people who died before they could be tested.” Greg Slabodkin voices similar concerns in Health Data Management and points at the need of an integrated global biosurveillance system.

The absence of reliable and actionable data has badly hampered the efforts of combatting Ebola and providing proper medical care to the victims. CDC Director Dr. Tom Frieden describes it as a “fog-of-war situation”.

Data Mining: Bots Were the First to Warn

When you flip the coin, however, the situation is not completely bleak and desperate. Even if Big Data technologies have fallen short in predicting, tracking, and containing the epidemic, mainly due to the lack of data from the ground, it has not entirely failed. Data scientists and healthcare experts world over are making concerted efforts to know, track, and defeat the Ebola virus—some on the ground and some in their labs.

The increasing level of collaboration among the biomedical specialists, geneticist, virologists, and IT experts has definitely contributed to slow down the transmission of the virulent disease dubbed as “the plague of modern day”. Médecins Sans Frontières and Healthmap.org are the excellent examples in this regard.

By deploying bots and crawlers and by using advanced machine learning algorithms, the Boston-based global infectious disease surveillance system, HealthMap was able to predict and raise concerns about the spread of a mysterious hemorrhagic fever in West Africa nine days earlier than WHO did.

Run by a team of 45 researchers, epidemiologists, and software developers at Boston Children’s Hospital, HealthMap mines data from search engine queries, social media platforms, health information sites, news reports and crowd-sourced information to track the transmission of the disease and provides an up-to-date timeline report with an interactive map, making it easier for the international health agencies to devise more effective action plans.

HealthMap serves as a good example of how crucial Big Data and data mining technologies could be for handling a healthcare emergency with fact-based and data-driven decisions.

In their letter to The Lancet, research scientist Rashid Ansumana and his colleagues, working on Ebola in Sierra Leone, highlighted on the need of developing epidemic surveillance systems “by adopting new data-sharing technologies.” They wrote, “Emerging technologies can help early warning systems, outbreak response, and communication between health-care providers, wildlife and veterinary professionals, local and national health authorities, and international health agencies.”

Data-Driven Initiatives to Control the Outbreak

The era of systematic use of data for making better epidemiological predictions and for finding effective healthcare solutions began with Google Flue Trends in 2007, and the rapidly developing tools, technologies, and practices in Big Data have increased the roles of data in healthcare management.

There are a number of data-driven undertakings in progress which have contributed to counter the raging spread of Ebola. Brockmann Lab, run by Professor Dirk Brockmann and his colleagues, for example, has created a computer model for studying correlations and probabilities in the explosion of new cases of infection.

Transportation and Relative Import Risk, Source: Brockmann Lab

By applying computational and statistical models, they predict which areas, cities or regions in the world are at the risk of becoming the next Ebola epidemic hotspots. Similarly, Alessandro Vespignani–a network scientist, statistical physicist, and Northeastern professor–has been using human mobility network data to track the cases of Ebola infection and dissemination.

The Swedish NGO Flowminder Foundation has been aggregating, mining, and analyzing anonymized mobile phone location data and is developing national mobility estimates for West Africa to help the local and international agencies to combat the disease.

Meanwhile, innovations with Epi Info VHF, a software tool for case management, contact tracing, analysis and reporting services for Ebola and other hemorrhagic fever outbreaks and OpenStreetMap project for getting location information and spatial data of the affected areas have further helped to guide the intervention initiatives.

However, with all optimism about the growing roles of Big Data and data mining, we also need to be mindful about their limitations. Newsweek aptly puts: “While no media-trawling bot could ever replace national and international health agencies, such tools may be starting to help fill in some of the most gaping holes in real-time knowledge.”