The accuracy of a proposition (prediction) measures the extent to which it deviates from the true proposition (value). In this essay we will argue that the accuracy can be interpreted as a notion of probability, which is distinct from the traditional frequentist notion of probability, and that defining such a notion of probability is useful for at least two reasons: (1) in everyday usage and intuitive judgements, probability often refers to this accuracy-implied notion of probability; and, more importantly, (2) the accuracy- and frequency-implied notions of probabilities carry distinct information, which can be combined by tracking the two probabilities to get closer to the truth and achieve superior forecasts.

The accuracy of a prediction (proposition) or a model, which can be seen as a collection of predictions (propositions), refers to the extent to which the prediction deviates from the true value. The concept of accuracy is only meaningful under a fuzzy notion of truth, which allows a claim to be true to various degrees. In a bivalent of notion of truth, the truth is binary. A claim cannot be more or less true. The earth either is either flat or it is not. The claim that earth is flat is either true or false, meaning that it is either 100% accurate or 0% accurate. It cannot be both true and false, say half true and half false. Except that, in reality, it can be, at least as long as the curvature of earth can be quantified. In this case, the claim that the earth is flat can be true to various degrees, as measured by the extent to which the curvature of the earth deviates from that of a flat surface. The claim can be more or less true, depending on the actual curvature of the earth. As the curvature of the earth deviates away from that of a flat surface, the claim would be true to a lower degree and false to a higher degree, or, equivalently, less accurate. A fuzzy notion of truth, thus, allows us to quantify the degree of truth, which can also be regarded as accuracy, and rank claims by the extent to which they are true, allowing us to incorporate more information about a claim.

Proposition: Accuracy of a proposition is a measure of the probability with which the proposition is true in a bivalent notion of truth.

To see why this is the case, consider the example of earth’s curvature discussed above, and the propositions that the earth is spherical and that the earth is flat, both of which are less than perfectly accurate. The accuracy of both these propositions can be measured as the ‘distrance’ of the porposed shapes from the actual shape of the earth. The proposition that the earth is flat is less accurate in the sense that the aggregate distance of a flat surface, compared to that of a spherical surface, from the earth’s actual shape is higher. Now if we wished to judge these two propositions on a bivalent scale of truth, we would have to pick a threshold for accuracy, such that a proposition that (fails to cross) crosses this threshold is deemed (FALSE) TRUE in the bivalent sense of the word (we will use capitalised TRUE and FALSE to denote bivalent notions of truth and falsehood). Thus, the probability that a proposition with a given accuracy would be deemed TRUE would be given by the probability of picking an accuracy threshold that is lower than the accuracy of the proposition, and, hence, a more accurate proposition would be more probable to be TRUE.

But why compute such a probability? Why not just fix the bivalent truth threshold once and for all at some level? The main reason for not doing so is that the desired accuracy often depends on the context. For example, the accuracy of a prediction is not independent of the precision with which an observation is made. If observations are not too precise, accuracy beyond a certain limit cannot be established, and all one can ask is whether the accuracy exceeds a certain threshold. Furthermore, even if observations are precise, a certain degree of accuracy may be sufficient for the task at hand. For instance, if all one desires is to destroy a city with a missile, dropping the missile anywhere within the city may be sufficient, and the prediction regarding the missile’s landing position will only have to be accurate enough to insure that it will land within the city. But if the intention is to destroy a nuclear power plant inside the city without any civilian casualties, one would need to be much more accurate about where the missile will land. Thus, in different contexts, a different degree of accuracy will be required, and predictions that are more accurate will be deemed to be accurate enough (TRUE) in a larger number of contexts, and, in that sense, will be more probable to be TRUE.

While we have described this accuracy implied notion of probability in terms of how frequently the proposition will be TRUE for different choices of the accuracy threshold, this notion of probability is not identical to the commonly used frequentist notion of probability, which measures how frequently a proposition is deemed TRUE for a fixed accuracy threshold. In general, each proposition would make several predictions, some of which may be more accurate than the others. Therefore, the accuracy of a proposition will not be a unique number, and a proposition that on average makes more accurate predictions may not be deemed TRUE more often. For example, the proposition that the planets are not flat is frequently TRUE because there no flat planets, but not necessarily very accurate because it never accurately predicts the shape or size of any single planet. While, in contrast, the proposition that the planets are ‘earth-like’ may not be frequently TRUE if other planets have sufficiently different shapes or sizes, but is much more accurate at least in the case of earth and other planets like it.

Thus, the frequency and the accuracy of a proposition’s truth are better understood as two distinct and complementary dimensions of the truth. And in general, one is interested in both of them, and, hence, in both the frequentist and the accuracy-implied probabilities of truth for any given proposition. To sum up: the frequentist probability of TRUTH tells us how frequently does a proposition cross a given accuracy threshold to be deemed TRUE; and the accuracy-implied notion of probability tells us how often the proposition will be deemed TRUE if we randomly select an accuracy threshold, our task is to find propositions that are more accurate and frequently so.

In practice, however, the two measures are often conflated in such a way that the distinct information they provide is lost. We discussed one example of this conflation in the context of conjunction fallacy in a previous essay, where intuitive judgements seem to be using an accuracy-implied notion of probability, and, as a result, are inconsistent with a frequentist notion of probability. Predictions of a conjunction of propositions, such as the proposition that Linda is both a bank teller and a feminist, can be more accurate as its predictions will sometimes match both attributes of an observed person. If each observed attribute was plotted on a separate dimension on a plane, each observation will show up as a point on this plane, and the predictions of the conjunction will sometimes lie exactly on top of the observed point. But the predictions of an individual proposition, such as the proposition that Linda is a bank teller, will never match any prediction perfectly, because this prediction only ever predicts one attribute, i.e. all its predictions lie along a single axis, while the actual observations lie on the plane, and never on this axis. Thus, in this example of conjunction fallacy, one notion of probability — the accuracy implied notion — is perhaps being directly substituted with another notion of probability — the frequentist notion.

Another, and a more technical, example can be seen in a commonly used method for model comparison and estimation, which relies on minimising model errors, such as the sum of squared errors. Minimising sum of squared errors does not distinguish between a model that makes several mildly accurate predictions and a model that makes some very accurate and some very inaccurate predictions. Ranking and selecting models based only on the average accuracy loses distinct pieces of information contained in the frequency of accurate predictions and the accuracy of individual predictions. Error minimisation would not lose any information only when a model is at least as accurate but more frequently TRUE than another model. But if one model is more accurate and the other is more frequently TRUE, error minimisation would select the one that is on average more accurate, and lose all the informational value of the other model. A model that is on average less accurate, may still be better in the sense that is more frequently accurate. And optimally one would wish to combine models in a way the resultant model is both more accurate and frequently so.

For example, consider two models, A and B. Model A makes predictions all with an accuracy of 0.73, while model B makes 50% predictions with an accuracy of 1 and 50% with an accuracy of 0.5. Thus, the average accuracy of Model B would be 0.75, and an error minimisation procedure would pick model B. However, model A is more frequently TRUE than model B, and model B would only strictly dominate model A if it made all predictions with an accuracy of 0.75, but that is not the case. Optimally, one should use model A as a ‘base model’ for every prediction, as it is more frequently TRUE, but rely on the predictions of model A, which are more accurate, whenever they are consistent with model B. In practice, this can be implemented by first obtaining the prediction of model B, and then drawing a confidence region around this prediction implied by the accuracy of this prediction, which in this case is 0.73. The higher the accuracy of the model, the smaller will be the volume of this confidence region. And since the model A’s predictions are always true with an accuracy of 0.73, this confidence region will always contain the true value. Thus, the prediction of model B will help guarantee that the true value is within the region specified. In the next step, one looks at the prediction of model A. If the prediction of model A lies outside the confidence region (of model B’s prediction), it is ignored. But if the prediction lies within the confidence region, we use this prediction, as model A’s prediction is more accurate. If all the false predictions of model A lie outside the confidence region of model B’s predictions, our predictions will be true with an accuracy of 0.73 (the accuracy of model A) 50% of the time, and true with an accuracy of 1 (the accuracy of model B) for the remaining 50% of the time, thus, yielding an average accuracy of 0.865!

The same criticism holds for using error minimisation for model estimation, which is equivalent to choosing a set of model parameter values from a range of possible values that minimise the total error.

This example is entirely artificial, and is designed mostly to make error minimisation look bad, but realistic examples of this issue are not difficult to find. For example, there is the Relativistic Quantum Mechanics (RQM) and the Quantum Field Theory (QFT). RQM is really an approximation of QFT, and QFT makes more accurate predictions in several contexts, especially those containing creation and annihilation of particles. However, the original QFT (prior to what is called re-normalisation, which happened several decades later) also makes some not so accurate predictions, or to be more precise, it predicts the mass and the charge of several fundamental particles to be infinite! Given such an enormous error in prediction the fundamental particles’ mass and charge values, error minimization would prefer a RQM, which, however, is structurally incapable of incorporating the aspects that QFT predicts. In general it is quite likely that a more structurally sophisticated (complex) theory would also make more inaccurate predictions than a simpler, but approximate, theory.

Thus, in a general setting, where one has multiple models at one’s disposal, one can combine these models in a similar manner, the details of which will be discussed in another essay, to arrive at a model that combines the best of all models and yields predictions that are most frequently TRUE with the highest possible accuracy.