Source: pasja1000/Pixabay

In a recently published news story, we learn about a young doctor, Jake Deutsch, and his personal experience with . Deutsch tested positive for Covid-19 and checked himself into a hospital after suffering respiratory problems. Tests showed he had pneumonia. He decided to try hydroxychloroquine, an antimalarial drug, and his symptoms started to improve a couple of days later. James Cai, a 32-year-old physician assistant and the first case of Covid-19 in New Jersey, had a positive experience with another , remdesivir, an antiviral drug.

On the website for the Centers for Disease Control and Prevention (CDC), hydroxychloroquine, chloroquine, and remdesivir are mentioned as “therapeutic options” and drugs under investigation.

Given these people’s positive experiences with these drugs, the question is, Why can’t people with Covid-19 take these medications now? Why must thousands of people wait for the results of research studies and clinical trials? Why can’t they rely on anecdotal evidence?

True experiments, unlike anecdotal evidence, often require random sampling and random assignment. In this post, I try to explain the importance of random sampling; in my next post, I will explore random assignment.

Let me use an (oversimplified) example: Suppose there are 2 million Americans with Covid-19 in the U.S. This is our study’s population. We want to examine Covid-19 in a group of 10,000 Americans. This is our sample. Our goal is to make valid inferences, based on analyses of our sample, about the population of all Americans with coronavirus disease 2019.

Non-probability sampling

First, we must select a sample. How do we pick the 10,000 individuals for our investigation? One approach is to consider non-probability sampling. In non-probability sampling, we do not know the probability that any individual is selected into the sample.

Convenience sampling. One form of non-probability sampling is convenience sampling, meaning sampling based on accessibility and opportunity. For instance, we can draw our participants from patients at various hospitals.

Quota sampling. Another approach is quota sampling; in this method, our selection is guided by creating a sample similar to the population. For example, assuming 30% of individuals with Covid-19 live in poverty, to make sure 30% of the sample also includes people living in poverty, one would select poverty-stricken participants until the quota is filled.

Probability sampling

Sometimes, such as with smaller populations, we might be able to produce a sampling frame (i.e. a list of all items or people in the population). If so, then we could perhaps use the technique of probability sampling. In probability sampling, each member of the population has a chance higher than zero of being included in the sample, and we know (or can calculate) the probability of each person being selected.

Using the technique of probability sampling makes it easier to determine if the sample is actually representative of the population.

Our goal is random sampling. In other words, we would like to increase the likelihood that every potential participant has the same chance of being picked from the population.

Why is random sampling important? Because it makes it easier to make generalizations. For instance, it is difficult to make generalizations about how Covid-19 is affecting people if we use a convenience sample of young people in Manhattan than if we use a random sample of people of all ages living in different U.S. states.

So, how do we select a random sample?

Systematic sampling and simple random sampling. Perhaps the most common approach is to use the simple random sampling technique. To do so, we choose our participants based on a list of random numbers. Another approach is systematic sampling. Here, we select the nth member (e. ., every 6th member) of the population.

As you recall, for our hypothetical study we need to select 10,000 participants out of 2 million people. If we divide 2 million by 10,000, we get 200. Therefore, we need to pick every two-hundredth person.

See figure below, for a visual comparison between these two sampling techniques.

Source: Arash Emamzadeh

Stratified sampling. Suppose you want to make sure a large enough number of certain types of people with Covid-19 (e.g., without underlying health problems) are included in your sample. A common solution to this problem is using stratified random sampling. In stratified random sampling, one splits the population into non-overlapping groups (e.g., under 30 years of age, 30 years and over) and then uses systematic or simple random sampling to select participants from within each of these strata.

If you want to make sure your sample has the same proportions of individuals of interest as the population (e.g., the same proportion of people with the coronavirus disease 2019 who show no symptoms), you can use proportionate stratified random sampling.

Concluding thought

When we understand the advantages and disadvantages of commonly used sampling methods (e.g., systematic sampling, simple random sampling, convenience sampling), not only do we become better producers and consumers of research, but we also become better at making decisions affecting our lives.

One such decision involves treatment for Covid-19. Who is receiving hydroxychloroquine, remdesivir, and similar treatments now? Possibly those who are more desperate, financially capable, or informed about these drugs. This is not a random sample. Before we recommend a drug to millions of people, we need to study its effects on the new disease systematically. It doesn’t matter that these drugs are not new. Even old treatments can be harmful. For instance, based on recent data, the majority of people with Covid-19 who are intubated and put on ventilators die. Perhaps they were very ill already and were going to die anyway. But maybe intubation itself can be harmful sometimes, so less invasive treatments may work better in some cases. This is why we need to study treatments systematically.

In my next post, I explain the importance of another key aspect of conducting research: random assignment.