Abstract Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology).

Citation: Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C (2013) Approximate Bayesian Computation. PLoS Comput Biol 9(1): e1002803. https://doi.org/10.1371/journal.pcbi.1002803 Editor: Shoshana Wodak, University of Toronto, Canada Published: January 10, 2013 Copyright: © 2013 Sunnåker et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: MS was supported by SystemsX.ch (RTD project YeastX). AGB was supported by SystemsX.ch (RTD projects YeastX and LiverX). JC was supported by the European Research Council grant no. 239784. EN was supported by the FICS graduate school. MF was supported by a Swiss NSF grant No 3100-126074 to Laurent Excoffier. CD was supported by SNSF advanced researcher fellowship no. 136461. The funders had no role in the preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

This is a “Topic Page” article for PLOS Computational Biology.

History The first Approximate Bayesian computation (ABC)-related ideas date back to the 1980s. Donald Rubin, when discussing the interpretation of Bayesian statements in 1984 [1], described a hypothetical sampling mechanism that yields a sample from the posterior distribution. This scheme was more of a conceptual thought experiment to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the ABC-rejection scheme, and this article can be considered to be the first to describe approximate Bayesian computation. Another prescient point was made when Rubin argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC. In 1984, Peter Diggle and Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is intractable [2]. Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new [3], [4], Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable. Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of Simon Tavaré et al. [5] was first to propose an ABC algorithm for posterior inference. In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the most recent common ancestor of the sampled individuals. Such inference is analytically intractable for many demographical models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by Jonathan K. Pritchard et al. [6] using the ABC method. Finally, the term Approximate Bayesian Computation was established by Mark Beaumont et al. [7], extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, or phylogeography.

Model Comparison with ABC Besides parameter estimation, the ABC-framework can be used to compute the posterior probabilities of different candidate models [15]–[17]. In such applications, one possibility is to use the rejection-sampling in a hierarchical manner. First, a model is sampled from the prior distribution for the models; then, given the model sampled, the model parameters are sampled from the prior distribution assigned to that model. Finally, a simulation is performed as in the single-model ABC. The relative acceptance frequencies for the different models now approximate the posterior distribution for these models. Again, computational improvements for ABC in the space of models have been proposed, such as constructing a particle filter in the joint space of models and parameters [17]. Once the posterior probabilities of models have been estimated, one can make full use of the techniques of Bayesian model comparison. For instance, to compare the relative plausibilities of two models and , one can compute their posterior ratio, which is related to the Bayes factor : If the model priors are equal ( ), the Bayes factor equals the posterior ratio. In practice, as discussed below, these measures can be highly sensitive to the choice of parameter prior distributions and summary statistics, and thus conclusions of model comparison should be drawn with caution.

Conclusion In conclusion, ABC represents a class of well-founded and powerful methods for Bayesian statistical inference. However, reliable application of ABC requires additional caution to be considered, due to the approximations and biases introduced at the different stages of the approach. In its current incarnation, the ABC toolkit as a whole is best suited for inference about parameters or predictive inferences about observables in the presence of a single or few candidate model(s). How to make ABC practically feasible for problems involving large sets of models and/or high-dimensional target parameter spaces is currently largely an open issue. Since the computation of the likelihood function is bypassed, it can be tempting to attack high-dimensional problems using ABC, but inevitably this comes bundled with new challenges that investigators need to be aware of at each step of their analyses.

Acknowledgments We would like to thank Matthew Astley, Joachim M. Buhmann, Sotiris Dimopoulos, Nick Goldman, Darren Logan, Joerg Stelling, and Elias Zamora-Sillero for useful discussions and/or feedback on the manuscript. We also thank Christian P. Robert and Dennis Prangle for their constructive referee reports and Daniel Mietchen for editorial work and help to bring the article up to Wikipedia standards. This article started as assignment for the graduate course “Reviews in Computational Biology” (263-5151-00L) at ETH Zurich. The version history of the text file and the peer reviews (and response to reviews) are available as supporting information in Text S1 and S2.