3. Aggregations and calculations that can be done with the case data are not necessarily what should be done with the case data.

Tableau and other tools make it easy to quickly create charts, graphs, and maps, as well as to run calculations with those numbers. It’s also common practice in data visualization to create benchmarks or comparisons between groups and countries in our work. However, when visualizing COVID-19 data these calculations need to reflect the basic principles of epidemiology.

There are nuances in the definitions of different kinds of cases (including COVID-19 definitions) which affect whether they can be aggregated or not. In public health, there are calculated metrics — such as case fatality rate — with very specific definitions that are used to understand and monitor disease spread and human impact. Just because you can perform a mathematical function on a set of health statistics doesn’t mean you should.

For example, one chart shared about COVID-19 summed the total deaths to date and divided it by the known days in the epidemic to create a special disease deaths per day aggregation. Then, that number was calculated for other major diseases for comparison. At best, this is an inaccurate comparison due to major differences in our knowledge of and resources for testing and treatment of COVID-19 compared to other diseases. At worst, it significantly understates the seriousness of COVID-19 and causes people to ignore the advice of public health professionals on social distancing and other individual actions that can slow the spread of the virus.

Finally, determining the share of the population infected or the share of infected persons who die from the disease are incredibly challenging calculations due to uncertainty in the denominator. Proceed with extreme caution when calculating any rates, and, better yet, please leave the rate calculations to the epidemiologists.

4. Be cautious when making generalized predictions or comparisons based on regionally specific data.

Many factors affect the spread and impact of the virus — such as the measures taken by a government to combat the spread and underlying population demographics.

Because of these differences, consider what is implied when making comparisons between countries with very different population sizes, political environments, and public health systems.

For example, the population of Italy skews older than that of China or the US. Because elderly populations have been identified at higher risk and are more likely to require hospital care, the percentage of cases requiring hospitalization may be higher in Italy than in countries with a younger population. (More on the ways demographics are influencing outcomes in Italy.)

5. Visualizations should inform and be honest about what isn’t represented.

There is much uncertainty in the data we have, particularly when trying to extrapolate to a general population. With an emerging disease, disaggregating and looking at cases and rates in sub-populations can help us to better understand the disease.

The number of confirmed cases is only a subset of infected persons in the population, and the number is impacted by health seeking behavior (if I’m sick, do I go to the doctor?), test kit availability (if I go to the doctor, can I get a test?), health systems factors, and other considerations.

COVID-19 is not a death sentence, and our visualizations need to reflect that. Including ‘recovered cases’ is an essential piece of context in visualizing case numbers.

Reiterating here: calculating rates — like the case fatality rate — is challenging without an accurate denominator. Leave the rate calculations to the epidemiologists.

6. Epidemiologists and public health agencies create complex models to understand how the disease may progress.

These data are likely not going to feed into a dashboard, but sometimes get cited and sourced in static charts and graphs. The benefit of using results from models from WHO, CDC, and other public health experts is that they typically go through some level of peer-review before being published.

Proceed with caution if incorporating these numbers in a visualization though: models are complex, as they try to account for the behavior of the virus, human behavior, and systems factors. As a result, models will change. If you use data from a model, document the inputs and sources thoroughly.

7. Data scientists and statisticians have also been publishing their own models and related conclusions about disease projections.

Use these with caution in framing your visualization and analysis unless they are well sourced, documented, and explained. **Preferably validated by an epidemiologist or someone else with related expertise.**

Modeling disease is complex (see #6). Rough, “back of the envelope” calculations can be more fear-inducing than helpful.

Instead, rely on well-sourced models from public health agencies and experts.

8. Make thoughtful design decisions.

Still committed to creating a visualization about COVID-19? Read existing resources on responsible visualization approaches in this context before publishing any charts or maps.

Datawrapper has an excellent set of responsible visualizations of COVID-19 with notes on the design decisions they made.

“What we considered when making these visualizations” from the awesome team at Datawrapper (Source)

You can also read this excellent thread of recommendations and critiques on visualizing COVID-19 from Evan Peck.

9. Consider the human side of what you create.

Reference terms correctly (see WHO definitions for COVID-19 cases, an explainer on R0, and the CDC Glossary as resources) and clearly define each metric for your audience somewhere in the visualization — that can be a footnote, title, subtitle, annotation, explainer text…just make sure it’s there.

Be considerate of the language you use in your visualization.

Remember that behind every data point is a person in a COVID-19 dataset. If you wouldn’t feel comfortable having someone from a high risk group read what you wrote, please revise.

10. Consider how visualizations can impact (and encourage) social responsibility as we see COVID-19 in our respective communities.

Self quarantine where appropriate. Ensure we’re not stigmatizing people who are from countries and regions that have had a lot of cases. Understand what additional steps you can take to flatten the curve and slow the spread of the virus in your community.

Esther Kim and Carl Bergstrom (Source)

And finally, consider visualizing other relevant data about impacted communities if you don’t feel you have the public health knowledge to add to the conversation around COVID-19 cases. Epidemic data isn’t a dataset to play with just to have something to show off on Twitter.