Generally speaking, a well-designed experiment is one that is sufficiently powered and one in which technical artifacts and biological features that may systematically affect measurements are randomized, balanced, or controlled in some other way in order to minimize opportunities for multiple explanations for the effect(s) under study. Guidelines toward achieving this end have been reviewed for bulk RNA-seq studies [1, 2] and many of the same guidelines also hold for scRNA-seq. In short, to minimize potential artifacts and confounding, it is important to randomize or balance over as many factors as possible, ideally so that putatively interesting features that are observed in the data are not due to artifacts imposed during sample preparation and/or data collection. Although the specific factors that are important in any design are highly experiment-dependent, if multiple populations are being assessed, a primary candidate for randomization is the order with which cells from distinct populations are processed and libraries are constructed. If multiplexing is done, the assignment of barcoded samples should be randomized or balanced across multiple lanes to minimize potential lane effects.

While randomization should be carried out to the extent possible, it is important to point out that, in many cases, conducting a fully randomized experiment is not realistic. Limited samples, a fixed number of single-cell isolation platforms and sequencers, time constraints, and budgets often prohibit the theoretically ideal experiment from being realized in practice. In most cases, samples must be processed in multiple batches, with randomization occurring within batch. This is exemplified nicely in an experiment concerning embryonic development that profiles single-cell expression in oocytes and zygotes as well as in 2-cell, 4-cell, and 8-cell embryos [3]. In this study, within each batch, each developmental stage was represented and stages were randomized in an effort to ensure that batch and stage were not confounded. Further discussion of possible batch effects and a specific experimental design that reduces the confounding of batch effect with biological signal are given in Hicks et al. [4]. Specifically, they propose a design in which cells from the biological conditions under study are represented together in multiple batches, which are then randomized across sequencing runs, flow cells, and lanes as in bulk-RNA-Seq. With this design, one can model and adjust for batch effects that result from systematic experimental bias.

Experimental design considerations will also be affected by the various protocols and platforms available for scRNA-seq. Platforms for isolating single cells vary substantially with respect to capacity, cost, and time. Both Saliba et al. [5] and Kolodziejczyk et al. [6] review single-cell isolation practices in detail, with the latter including details on the more recent droplet methods. Additional consideration must be given to the protocols used for extracting RNA from each isolated cell and for its conversion to cDNA; common methods vary with respect to transcript coverage and strand specificity. Details are provided in reviews by Grün and van Oudenaarden [7], Saliba et al. [5] and Kolodziejczyk et al. [6]. Another issue concerns whether or not to include synthetic spike-ins (external transcripts added in known concentrations) or unique molecular identifiers (UMIs; short random sequences attached to individual cDNA molecules). While both have a number of theoretical advantages for normalization and expression estimation, practical challenges have prevented their routine use in scRNA-seq studies to date. In addition to the challenges detailed by Stegle et al. [8], spike-ins are typically added in an scRNA-seq experiment at very high relative concentrations and, consequently, they take up a relatively large proportion of reads, an important consideration during deliberations regarding experimental design. In addition, recent droplet technologies are not yet able to accommodate spike-ins. UMIs offer a great advantage in reducing noise resulting from amplification bias [9, 10], but protocols that implement UMIs sequence only the 5′ or 3′ end of each transcript, making them unsuitable for studies of isoforms or allele-specific expression.

Whatever the design, it is always beneficial (and requires almost no additional resources) to record and retain information on as many factors as possible to facilitate downstream diagnostics. Just as it is standard to check residuals following a linear regression, it should be standard in sequencing experiments to check that effects of interest are not confounded by variations in technician, sample processing date/time, reaction temperature, position on cell capture device, lane, batch, proportion of detected genes, and so on. Through such an analysis, Leng et al. [11] identified an artifact related to position on Fluidigm’s IFC array; once identified, the effects of such artifacts can be removed [12]. It is also important to note that the sources of variation in an scRNA-seq experiment are not yet completely understood, and there likely are systematic effects that will be important in scRNA-seq that have yet to be discovered. Novel methods to identify such factors are needed and are beginning to be developed [4].

While many of the design principles established for bulk RNA-seq hold in the single-cell setting, specific guidelines to define what is meant by 'sufficiently powered' in an scRNA-seq experiment are less clear. As with bulk-RNA-seq, guidelines will generally depend on the questions of interest.

While most studies do not address the question of determining the minimum number of cells required for a given task, identifying the sequencing depth at which the majority of human transcripts expressed in a cell, or population of cells, are detected is a question that has received considerable attention. The consensus is that, beyond one million reads, there is very little change (<5 %) in the number of reliably expressed genes detected in a cell [13]. In fact, the majority of genes seem to be detected at 500,000 reads; and over half are typically detected at 250,000 reads [13, 14]. Furthermore, Shalek et al. [15] demonstrated that one million reads is sufficient to estimate the fraction of detectably expressing cells within a population and also to estimate the mean and variance of a gene’s expression among detectably expressing cells.

More reads will be required for more refined tasks, such as fully characterizing transcript structure, estimating the expression of rare isoforms, or distinguishing cells on the basis of subtle differences. Fewer reads but larger cell numbers may be preferred when mapping out a large population, searching for rare but distinct cell types, or pooling cells in silico to obtain average gene-expression clusters. Guidelines have yet to be reported for these considerations, as well as for most analysis tasks such as sub-population identification and the identification of transcripts showing differential distributions across conditions. As with any power calculation, precise guidelines will depend not only on the task at hand but also on the signal-to-noise ratio inherent to a given system. Pollen et al. [14] have shown, for example, that 50,000 reads is sufficient for cell-type classification in a sample of 301 cells containing diverse cell types. Conversely, in a seemingly homogenous cell population, deeper sequencing may be required to detect heterogeneity that is due to rare subpopulations. Evaluating the trade-off that exists between sequencing depth and number of cells will also depend on budget and, albeit to a much lesser extent, on platform, protocol, base-pair length, and genome size.