So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design

John List, Sally Sadoff, Mathis Wagner

Experimental economics represents a strong growth industry. In the past several decades the method has expanded beyond intellectual curiosity, meriting consideration alongside the other more traditional empirical approaches used in economics. Accompanying this growth is an influx of new experimenters who are in need of straightforward direction to make their designs more powerful. This column provides several simple rules of thumb that researchers can apply to improve the efficiency of their experimental designs.

Within economics, measurement approaches can be divided into two main categories: estimation of models that make use of naturally-occurring data and approaches wherein the analyst herself governs the data generation process. A handful of popular empirical approaches are typically used when the analyst is dealing with naturally-occurring data, but the literature is replete with criticisms of their identifying assumptions, many times based on restrictiveness or implausibility (see Blundell and Costas-Dias (2002) for a useful review).

In the cases where the analyst generates her own data, such as within the area of experimental economics, identification assumptions are much less severe. To obtain the effect of treatment in the particular domain of study, the only major assumption necessary is appropriate randomisation (with appropriate sample sizes). In this manner, when running an experiment, the analyst is using randomisation as an instrumental variable (see List 2006). But with the chore of data generation comes other, less discussed, obligations of the researcher. In this article, we consider one such feature more carefully: the optimal number and arrangement of subjects into experimental cells.

A casual look at the literature presents a striking consistency concerning sample sizes and their arrangement. Most studies uniformly distribute at least 30 subjects into each cell. This approach holds whether the analyst is making use of a purely dichotomous treatment (such as pill or no pill) as well as when the analyst is exploring levels of treatment (such as, various dosage levels). However, allocating 30 subjects in each experimental treatment cell has little basis in terms of power unless the researcher believes that he wants to detect an approximately 0.70 standard deviation change in the outcome variable (under conventional significance and power levels of 0.05 and 0.80 respectively).

Discussion of whether such a sample arrangement is efficient is more mature in other literatures, but it has not been properly vetted in the experimental economics community. Our study attempts to fill this gap. In doing so, we do not claim originality in any of the derivations, rather this article should be viewed as a compilation of insights from other literatures that might help experimenters in economics and related fields design more efficient experiments.

The overarching idea revolves around first implementing an experimental design that maximises the variance of the treatment variable, and second adjusting the samples to account for heterogeneity in treatment effects or costs, if necessary. Several simple rules of thumb then follow. In List et al (2010), we expand on these rules of thumb, including formulas for optimal sample size calculations and empirical examples from the literature.

Simple rules of thumb for optimal experimental design

With a continuous outcome measure one should only allocate subjects equally across treatment and control if the sample variances of the outcome means are expected to be equal in the treatment and control groups. Under the assumption of homogenous treatment effects, one would need n = 16 (64) observations in each treatment cell to detect a one (one-half) standard deviation change in the outcome variable (following the standards in the literature of a significance level of 0.05, and setting power to 0.80). To detect a one-tenth standard deviation change, 1,568 subjects are needed in each treatment cell.

In those cases where the sample variances are not equal, the ratio of the sample sizes should be set equal to the ratio of the standard deviations.

If the cost of sampling subjects varies across experimental cells, then the ratio of the sample sizes is inversely proportional to the square root of the relative costs.

When the unit of randomisation is different from the unit of analysis, special considerations must be paid to correlated outcomes. In the presence of intracluster correlation, it is important to randomise over as small clusters as possible so as to maximise the efficiency of the experiment. For example, a sample size of 1,000 subjects allocated to 20 clusters with 50 subjects each or, alternatively, to 50 clusters with 20 subjects each will yield power levels of 45% and 75% respectively. (The example assumes a significance level of 0.05, standardised treatment effect size of 0.2 and intracluster correlation coefficient of 0.01.)

When the treatment variable itself is continuous, the optimal design requires that the number of treatment cells used should be equal to the highest polynomial order plus one. The primary goal of the experimental design in this case is to maximise the variance of the treatment variable (while still providing enough points to determine the polynomial). For instance, if the analyst is interested in estimating the effect of treatment and has strong priors that the treatment has a linear effect, then the sample should be equally divided on the endpoints of the feasible treatment range, with no intermediate points sampled.

Despite the strong growth in experimental economics, several prominent discussions remain. These include generalisability of results across domains (but, see Levitt and List, 2007, and subsequent studies), use of the strategy method, one-shot versus repeated observations, elicitation of beliefs, “within” versus “between” subject experimental designs, using experiments to estimate heterogeneous treatment effects; and in the design area more specifically, optimal design with multiple priors and Bayesian and frequentist sample size determination are but just a few areas not yet properly vetted in the experimental economics community.

Clearly, this article represents only the tip of the iceberg when it comes to discussing optimal experimental design. For instance, we can imagine that an entire set of papers could be produced to describe how to design experiments based on power testing and confidence intervals. More generally, we hope that methodological discussion eventually sheds its perceived inferiority in experimental economics and begins to, at least, ride shotgun in our drive to a deeper understanding of economic science.

References

Blundell, Richard and Monica Costas Dias (2002), “Alternative Approaches to Evaluation in Empirical Microeconomics”, Portuguese Economic Journal, 1:91-115.

Levitt, Steven D and John A List (2007). “What do Laboratory Experiments Measuring Social Preferences tell us about the Real World?”, Journal of Economic Perspectives, 21(2):153-174.

List, John A (2006), "Field Experiments: A Bridge Between Lab and Naturally Occurring Data", Advances in Economic Analysis and Policy, 6(2), Article 8.

List, John A, Sally Sadoff, and Mathis Wagner (2010), “So You Want to Run an Experiment, Now What? Some Simple Rules of Thumb for Optimal Experimental Design”, Experimental Economics (forthcoming).