I'm reviewing Bayes' theorem and related topics for the upcoming GDAT class . In its simplest form, Bayes' theorem is a statement about conditional probabilities. The probability of A, given that B has occurred, is expressed as: \begin{equation} \Pr(A|B) = \dfrac{\Pr(B|A)\times\Pr(A)}{\Pr(B)} \label{eqn:bayes} \end{equation} In Bayesian language, $\Pr(A|B)$ is called theprobability, $\Pr(A)$ theprobability, and $\Pr(B|A)$ the likelihood (essentially a normalization factor).

Source: Wikipedia

Bayes' original conception had to do with capturing the notion that our certainty about the state of the world is altered by the arrival of new information (data). Exactly how eqn. \eqref{eqn:bayes} should be interpreted has been the subject of vigorous historical debate (which I won't go into here). Part of the problem was that the early notions of probability arose out of trying to formalize scenarios where the outcome was not known in advance; like games of chance and betting. The basic concept there is odds, not probability. This is still true today. Bookmakers calculate odds based on what they believe about certain information to which they are privy. Consequently, Bayes used the word belief in his essay, rather than the word probability, and many critics found the Bayesian approach too subjective for determining scientific truth. Today, however, Bayesian inference is a hot topic and a central part of machine learning.

One of the things that has always intrigued me was something I read about the French mathematician Pierre Simon Laplace using Bayes' theorem to estimate the mass of the planet Saturn in the early nineteenth century and that his method turned out to be remarkably accurate. As a matter of historical fact, Laplace (the Frenchman), working on mathematical theories of probability during the period 1812 to 1816, was entirely unaware of the prior work of Bayes (the Englishman) published posthumously in 1763. Rather, Laplace discovered a more generalized version of Bayes' theorem on his own: \begin{equation} \Pr(M~|~D,I) = \dfrac{\Pr(D~|~M,I)\times\Pr(M ~|~I)}{\Pr(D~|~I)} \label{eqn:lapbayes} \end{equation} where $M$ is the mass of the planet (in particular, the mass of Jupiter, Saturn or Uranus), $D$ is data from various measurements of planetary orbits, and $I$ is existing background information, e.g., Newton's laws of celestial mechanics. You can see the similarity between eqns. \eqref{eqn:bayes} and \eqref{eqn:lapbayes}.

Although many authors refer to Laplace's work, while others also note that his estimate of the mass of Saturn only differs from the modern value by about one half of one percent, nobody that I've read explains how Laplace actually applied eqn. \eqref{eqn:lapbayes} to achieve his remarkable result. However, I just stumbled upon an English translation of Laplace's calculation and I am going to reproduce it here using R. I'll try to stick with Laplace's notation as much as possible and explain what it means in modern statistical terminology.

Laplace credits astronomer and former student Monsieur Alexis Bouvard with having already compiled measurements and calculations that facilitated his Bayesian approach. Out of some 129 equations for Saturn's motion—involving perturbations due to the tidal interaction between Jupiter and Saturn, as well as their moons, i.e., orbital wobbles—the following constants are distilled. \begin{equation} z^{\prime} = 0.00620 \label{eqn:eqnz} \end{equation} Laplace actually writes the decimal fractions as $0,00620$ using the European comma convention. I'll use the decimal point notation for clarity. \begin{equation} \log P = 4.8856829 \label{eqn:logP} \end{equation} Here, it turns out that log means decimal or base-10 logarithm, not base-e or natural log; as it does in R. This difference in naming screwed up my original calculations. In R notation, we can simply write the numerical value of P as \begin{equation} P = \exp(4.8856829 * \log(10)) \label{eqn:eqnP} \end{equation} where log is now the natural logarithm.

In Laplace's presentation, the mass of Saturn is then expressed as \begin{equation} \dfrac{1}{3534.08} (1 + z^{\prime}) \end{equation} Substituting the value in eqn. \eqref{eqn:eqnz} for $z^{\prime}$ produces: \begin{equation} \dfrac{1}{3534.08} (1 + 0.00620) = \dfrac{1}{3512.3} \end{equation} for the corrected mass. This looks a bit strange because the currently accepted value is $568.36 \times 10^{24}$ kilogram, but Laplace is not using MKS units. Instead, he expresses the mass as a fraction of the mass of the Sun. In other words, the estimated mass of Saturn is close to one 3500th of a solar mass. Then, he says:

"The probability that the error of $z^{\prime}$ is comprehended within the limits $\pm~U$ is, by $\S~1$, \begin{equation*} \dfrac{\sqrt{P}}{\sqrt{\pi}} \int ~du ~c^{-P ~u^2} \, , \end{equation*} the integral being taken from $u = -U$ to $u = U$."

The use of the letterinstead of $e$ in the integrand could be a typo in translation because Euler had already introduced the symbol $e$ for the base 2.718282 of the natural logarithms in the early part of the eighteenth century. On the other hand, it took time for his notation to be adopted and the use of 'c' remained a common practice. Laplace seems to have adopted $e$ for his later publications. With that assumption, I found the constant $P$ in eqn. \eqref{eqn:eqnP} (which he calls a) can be identified with the variance \begin{equation*} P = \dfrac{1}{2 \sigma^2} \end{equation*} With that, the above integral can be rewritten as \begin{equation} \Pr(M_{Sat} \, | \, D,I) = \frac{1}{ \sigma \sqrt{2 \pi} } \int_{-U}^{U} \, e^{ \frac{x^2}{2\sigma^2} } \, dx \label{qqn:satmass} \end{equation}

which is immediately recognizable as a normal distribution with zero mean and variance $\sigma^2$.

Earlier in his presentation, Laplace makes the association with the posterior probability of eqn. \eqref{eqn:lapbayes}. He then goes on to claim

"the probability that this mass is comprehended within the limits \begin{equation*} \dfrac{1}{3512.3} \, \pm \, \dfrac{1}{100} \, \dfrac{1}{3534.08} \end{equation*} is equal to $\dfrac{11327}{11328}$."

"Applying to them my formulae of probability I find that it is a bet of 11,000 against one that the error of this result is not 1/100 of its value or that which amounts to almost the same that after a century of new observations added to the preceding ones and examined in the same manner the new result will not differ by 1/100 from that of M. Bouvard."

About this ratio, Laplace announced elsewhere:Note the use of bookmaking parlance.

It becomes easier to understand what Laplace is saying if we plot the function represented by eqn. \eqref{eqn:satmass}.

The normal probability density function is centered at zero and the relevant probability is given by the blue area under the curve between the limits $\pm U$, indicated by the vertical dashed lines. It resembles a plot of the error distribution in the mass calculation. The horizontal blue line is the full width at half maximum height (FWHM). The horizontal red line is twice the standard deviation ($\sigma$) corresponding to the reduced FWHM. (See Visualizing Variance) Here is the R code that computes the above plot.

# Created by NJG on Saturday, September 14, 2013 # SATURN constants z = 0.00620 P = exp(4.885683 * log(10)) sd = sqrt(1/(2*P)) pSat = 11327 / 11328 # Msat is expressed as solar mass fraction Fnorm

One of the problems I ran into was determining the value for $U$. Laplace never states explicitly what value he uses. It turns out to be $U = 0.01$, i.e., the $1/100$ or 1% that appears in the interval that contains the mass of Saturn. Calculating the definite integral with those limits produces \begin{equation*} \Pr(M_{Sat}~|~D,I) = 0.9999117 \end{equation*} which corresponds to the fraction $11327 ~/~ 11328$ claimed by Laplace in the above quote. Once again, using the language of gamblers, Laplace remarks:

"Newton had found, by the observations of Pound on the greatest elongation of the fourth satellite of Saturn, the mass of this planet equal to $1/3012$, that which surpasses by a sixth the preceding result. There are odds of millions of billions against one that the one of Newton is in error."

$U = \pm~0.01$ corresponds to an interval slightly less than $\pm ~4 \sigma$.

It looks like a confidence interval, but it isn't in the modern sense because the number of measurements is essentially $n = 1$.

Similarly, we can't speak of $M_{Sat}$ as a mean value.

The 99% CI corresponding to $1 - 0.01$ would be nearer $3 \sigma$ from the origin.

What does all this mean?It seems Laplace is assuming a normal density function as a good guess for what the distribution of errorslook like as more data was accumulated in the future. Bayes would have assumed adistribution. Presumably, this is what Laplace means when he remarks about another century of new observations. His guessa good one because we now have more rigorous mathematical justification for assuming that the accumulation of errors does follow a Gaussian or normal distribution. By all accounts, Laplace drew a lot of flak for a lack of scientific (objective) rigor in developing this great intuitive (subjective) leap of probabilistic insight. Nonetheless, that's the point of the Bayesian approach: you can get away with a good guess for the prior.

Or was Laplace just lucky and beat the odds?