Approximate Bayesian computation
Computational method in Bayesian statistics / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about Approximate Bayesian computation?
Summarize this article for a 10 year old
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.
In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate.
ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection.
ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences, e.g. in population genetics, ecology, epidemiology, systems biology, and in radio propagation.[1]
The first ABC-related ideas date back to the 1980s. Donald Rubin, when discussing the interpretation of Bayesian statements in 1984,[2] described a hypothetical sampling mechanism that yields a sample from the posterior distribution. This scheme was more of a conceptual thought experiment to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the ABC-rejection scheme, and this article can be considered to be the first to describe approximate Bayesian computation. However, a two-stage quincunx was constructed by Francis Galton in the late 1800s that can be seen as a physical implementation of an ABC-rejection scheme for a single unknown (parameter) and a single observation.[3] Another prescient point was made by Rubin when he argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC.
In 1984, Peter Diggle and Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is intractable.[4] Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new,[5][6] Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable.
Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of Simon Tavaré and co-authors was first to propose an ABC algorithm for posterior inference.[7] In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the most recent common ancestor of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by Jonathan K. Pritchard and co-authors using the ABC method.[8] Finally, the term approximate Bayesian computation was established by Mark Beaumont and co-authors,[9] extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and phylogeography.
Approximate Bayesian computation can be understood as a kind of Bayesian version of indirect inference.[10][11]
Several efficient Monte Carlo based approaches have been developed to perform sampling from the ABC posterior distribution for purposes of estimation and prediction problems. A popular choice is the SMC Samplers algorithim [12][13][14] adapted to the ABC context in the method (SMC-ABC).[15][16][17][18]
Motivation
A common incarnation of Bayes’ theorem relates the conditional probability (or density) of a particular parameter value given data to the probability of given by the rule
- ,
where denotes the posterior, the likelihood, the prior, and the evidence (also referred to as the marginal likelihood or the prior predictive probability of the data). Note that the denominator is normalizing the total probability of the posterior density to one and can be calculated that way.
The prior represents beliefs or knowledge (such as f.e. physical constraints) about before is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior using a factorization of the joint distribution of all the elements of in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of , the evidence can be ignored, as it constitutes a normalising constant, which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood and the prior . For numerous applications, it is computationally expensive, or even completely infeasible, to evaluate the likelihood,[19] which motivates the use of ABC to circumvent this issue.
The ABC rejection algorithm
All ABC-based methods approximate the likelihood function by simulations, the outcomes of which are compared with the observed data.[20][21][22][23][24] More specifically, with the ABC rejection algorithm — the most basic form of ABC — a set of parameter points is first sampled from the prior distribution. Given a sampled parameter point , a data set is then simulated under the statistical model specified by . If the generated is too different from the observed data , the sampled parameter value is discarded. In precise terms, is accepted with tolerance if:
- ,
where the distance measure determines the level of discrepancy between and based on a given metric (e.g. Euclidean distance). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincides exactly with the data (event ) is negligible for all but trivial applications of ABC, which would in practice lead to rejection of nearly all sampled parameter points. The outcome of the ABC rejection algorithm is a sample of parameter values approximately distributed according to the desired posterior distribution, and, crucially, obtained without the need to explicitly evaluate the likelihood function.
Summary statistics
The probability of generating a data set with a small distance to typically decreases as the dimensionality of the data increases. This leads to a substantial decrease in the computational efficiency of the above basic ABC rejection algorithm. A common approach to lessen this problem is to replace with a set of lower-dimensional summary statistics , which are selected to capture the relevant information in . The acceptance criterion in ABC rejection algorithm becomes:
- .
If the summary statistics are sufficient with respect to the model parameters , the efficiency increase obtained in this way does not introduce any error.[25] Indeed, by definition, sufficiency implies that all information in about is captured by .
As elaborated below, it is typically impossible, outside the exponential family of distributions, to identify a finite-dimensional set of sufficient statistics. Nevertheless, informative but possibly insufficient summary statistics are often used in applications where inference is performed with ABC methods.
An illustrative example is a bistable system that can be characterized by a hidden Markov model (HMM) subject to measurement noise. Such models are employed for many biological systems: They have, for example, been used in development, cell signaling, activation/deactivation, logical processing and non-equilibrium thermodynamics. For instance, the behavior of the Sonic hedgehog (Shh) transcription factor in Drosophila melanogaster can be modeled with an HMM.[26] The (biological) dynamical model consists of two states: A and B. If the probability of a transition from one state to the other is defined as in both directions, then the probability to remain in the same state at each time step is . The probability to measure the state correctly is (and conversely, the probability of an incorrect measurement is ).
Due to the conditional dependencies between states at different time points, calculation of the likelihood of time series data is somewhat tedious, which illustrates the motivation to use ABC. A computational issue for basic ABC is the large dimensionality of the data in an application like this. The dimensionality can be reduced using the summary statistic , which is the frequency of switches between the two states. The absolute difference is used as a distance measure with tolerance . The posterior inference about the parameter can be done following the five steps presented in.
Step 1: Assume that the observed data form the state sequence AAAABAABBAAAAAABAAAA, which is generated using and . The associated summary statistic—the number of switches between the states in the experimental data—is .
Step 2: Assuming nothing is known about , a uniform prior in the interval is employed. The parameter is assumed to be known and fixed to the data-generating value , but it could in general also be estimated from the observations. A total of parameter points are drawn from the prior, and the model is simulated for each of the parameter points , which results in sequences of simulated data. In this example, , with each drawn parameter and simulated dataset recorded in Table 1, columns 2-3. In practice, would need to be much larger to obtain an appropriate approximation.
i | Simulated datasets (step 2) | Summary statistic (step 3) |
Distance (step 4) |
Outcome (step 4) | |
---|---|---|---|---|---|
1 | 0.08 | AABAAAABAABAAABAAAAA | 8 | 2 | accepted |
2 | 0.68 | AABBABABAAABBABABBAB | 13 | 7 | rejected |
3 | 0.87 | BBBABBABBBBABABBBBBA | 9 | 3 | rejected |
4 | 0.43 | AABAAAAABBABBBBBBBBA | 6 | 0 | accepted |
5 | 0.53 | ABBBBBAABBABBABAABBB | 9 | 3 | rejected |
Step 3: The summary statistic is computed for each sequence of simulated data .
Step 4: The distance between the observed and simulated transition frequencies is computed for all parameter points. Parameter points for which the distance is smaller than or equal to are accepted as approximate samples from the posterior.
Step 5: The posterior distribution is approximated with the accepted parameter points. The posterior distribution should have a non-negligible probability for parameter values in a region around the true value of in the system if the data are sufficiently informative. In this example, the posterior probability mass is evenly split between the values 0.08 and 0.43.
The posterior probabilities are obtained via ABC with large by utilizing the summary statistic (with and ) and the full data sequence (with ). These are compared with the true posterior, which can be computed exactly and efficiently using the Viterbi algorithm. The summary statistic utilized in this example is not sufficient, as the deviation from the theoretical posterior is significant even under the stringent requirement of . A much longer observed data sequence would be needed to obtain a posterior concentrated around , the true value of .
This example application of ABC uses simplifications for illustrative purposes. More realistic applications of ABC are available in a growing number of peer-reviewed articles.[22][23][24][27]
Outside of parameter estimation, the ABC framework can be used to compute the posterior probabilities of different candidate models.[28][29][30] In such applications, one possibility is to use rejection sampling in a hierarchical manner. First, a model is sampled from the prior distribution for the models. Then, parameters are sampled from the prior distribution assigned to that model. Finally, a simulation is performed as in single-model ABC. The relative acceptance frequencies for the different models now approximate the posterior distribution for these models. Again, computational improvements for ABC in the space of models have been proposed, such as constructing a particle filter in the joint space of models and parameters.[30]
Once the posterior probabilities of the models have been estimated, one can make full use of the techniques of Bayesian model comparison. For instance, to compare the relative plausibilities of two models and , one can compute their posterior ratio, which is related to the Bayes factor :
- .
If the model priors are equal—that is, —the Bayes factor equals the posterior ratio.
In practice, as discussed below, these measures can be highly sensitive to the choice of parameter prior distributions and summary statistics, and thus conclusions of model comparison should be drawn with caution.