Way back in 2007, I wrote a blog post giving Einstein’s derivation of his famous equation for the rest energy of a body with mass . (Throughout this post, mass is used to refer to the invariant mass (also known as rest mass) of an object.) This derivation used a number of physical assumptions, including the following:

The two postulates of special relativity: firstly, that the laws of physics are the same in every inertial reference frame, and secondly that the speed of light in vacuum is equal in every such inertial frame. Planck’s relation and de Broglie’s law for photons, relating the frequency, energy, and momentum of such photons together. The law of conservation of energy, and the law of conservation of momentum, as well as the additivity of these quantities (i.e. the energy of a system is the sum of the energy of its components, and similarly for momentum). The Newtonian approximations , to energy and momentum at low velocities.

The argument was one-dimensional in nature, in the sense that only one of the three spatial dimensions was actually used in the proof.

As was pointed out in comments in the previous post by Laurens Gunnarsen, this derivation has the curious feature of needing some laws from quantum mechanics (specifically, the Planck and de Broglie laws) in order to derive an equation in special relativity (which does not ostensibly require quantum mechanics). One can then ask whether one can give a derivation that does not require such laws. As pointed out in previous comments, one can use the representation theory of the Lorentz group to give a nice derivation that avoids any quantum mechanics, but it now needs at least two spatial dimensions instead of just one. I decided to work out this derivation in a way that does not explicitly use representation theory (although it is certainly lurking beneath the surface). The concept of momentum is only barely used in this derivation, and the main ingredients are now reduced to the following:

The two postulates of special relativity; The law of conservation of energy (and the additivity of energy); The Newtonian approximation at low velocities.

The argument (which uses a little bit of calculus, but is otherwise elementary) is given below the fold. Whereas Einstein’s original argument considers a mass emitting two photons in several different reference frames, the argument here considers a large mass breaking up into two equal smaller masses. Viewing this situation in different reference frames gives a functional equation for the relationship between energy, mass, and velocity, which can then be solved using some calculus, using the Newtonian approximation as a boundary condition, to give the famous formula.

Disclaimer: As with the previous post, the arguments here are physical arguments rather than purely mathematical ones, and thus do not really qualify as a rigorous mathematical argument, due to the implicit use of a number of physical and metaphysical hypotheses beyond the ones explicitly listed above. (But it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on rigorous mathematical reasoning.)

— 1. The main argument —

We will assume that the total energy of a moving body depends only on the mass of that body, and the velocity of that body:

(This is actually a non-trivial assumption; it excludes the possibility that the energy might also be depenent on other features of the body, such as spin or charge.) At present, this functional relationship is arbitrary. However, we can use some physical arguments to constrain this relationship. We first use the following argument of Galileo. Consider two bodies side by side, traveling at the same velocity , with the first body of mass and the second of mass . Then, the first body has energy and the second has energy , so the combined system of two bodies has total energy . On the other hand, if we imagine connecting the two bodies by an infinitesimally thin thread, we can view the system as a single body of mass traveling at the same velocity . This leads us to the relationship

for any , which (under reasonable hypotheses of continuity) implies a linear relationship between energy and mass, thus

for some function depending only on the velocity .

We still have to determine this unknown functional relationship . We assume rotational symmetry of the laws of physics (which one can view as a special case of the first postulate of special relativity): if two bodies of equal mass move at the same speed, but at different directions, the energies should be the same. In other words, should be spherically symmetric, so by abuse of notation we write

Now consider a body of mass at rest at the origin(in some reference frame ), which somehow disintegrates (at time , for simplicity) into two smaller bodies of equal mass , one moving in the positive direction at some velocity , and the other moving in the negative direction at the opposite velocity (note that this situation is consistent with the law of conservation of momentum). (If one prefers, one could also view the time-reversed situation, in which two masses of equal and opposite velocity collide to form a large stationary mass; the analysis of this situation is basically identical to the one given here.) In Newtonian mechanics, we have conservation and additivity of mass, so that must equal ; but we will not assume conservation and additivity of mass here (and in fact at least one of these laws must break down in special relativity, at least if one insists on using an invariant notion of mass). Instead, we can link , , and to each other by the law of conservation of energy. Before the disintegration, the body has total energy , while after the disintegration the system has total energy (using the spherically symmetric nature (1)) of , and so

Now we view the same system relative to another reference frame , which relative to is moving at a velocity in the direction for some , while keeping the and coordinates unchanged. The spacetime coordinates of are then related to those of by the usual Lorentz transformations

which can be deduced from the postulates of special relativity by a standard derivation that we will not give here (it is sketched in the previous blog post). The pre-disintegration body is moving along the worldline in the reference frame, and is thus moving along the line in the reference frame; in particular, it has velocity in this frame and thus has energy in this frame.

Now consider the first post-disintegration body . It is moving along the worldline in the reference frame, and thus along the line in the reference frame; in particular, the speed of in this frame is (the well known velocity addition (or subtraction) formula), and so the energy of this body is . Similarly, has energy . Equating energies, we are thus led to

We can eliminate using (2), to obtain a functional equation for :

This equation should hold for all (physically attainable) velocities . To solve this equation, it is convenient to work with the change of variables

the hyperbolic angles are known as the rapiditiesassociated to and respectively. The point of using this change of variables is that the hyperbolic tangent addition formula yields

Thus if we make the change of variables

then (3)simplifies to

It is tempting to plug in some special values into this equation, such as , but this only gives a trivial equation. However, if we first differentiate twice in to obtain

and thenset , we obtain the non-trivial equation

This is a differential equation in , and can be solved as

for some unknowns , where is the square root of . From (1), should have vanishing derivative at the origin, and so , and so we have

This is significant progress in constraining the behaviour of , but there are still two unknown parameters . To proceed further, it becomes necessary to utilise a second dimension. Namely, we repeat the previous arguments, but with now moving at velocity instead of . The Lorentz transformations are now

The pre-disintegration body is moving along the worldline in the reference frame, and is thus moving along the line in the reference frame; in particular, it has velocity in this frame and thus has energy in this frame.

Now consider the first post-disintegration body . It is moving along the worldline in the reference frame, and thus along the line in the reference frame; in particular, the speed of in this frame is , and so the energy of this body is . Similarly for . Equating energies, we are thus led to

We can eliminate using (2), to obtain a functional equation for :

This equation should hold for all (physically attainable) velocities . To solve this equation, we work with infinitesimal and perform a Taylor expansion. From the symmetry (2), should be flat at the origin, and so (assuming sufficient smoothness for ) we have

while from the Taylor approximation

we have

Inserting these expansions and extracting the coefficient, we obtain the differential equation

which we can rewrite as

for some constant . We can integrate this as

and thus

for some parameters . In rapidity coordinates , this becomes

Comparing this with (4)(e.g. by performing a Taylor expansion to fourth order around ) we see that , thus

or equivalently

Thus we have

For infinitesimal velocities , we may Taylor expand

and so the kinetic energy of a slowly moving mass is . Comparing this with the Newtonian approximation of we conclude that , and thus

In particular, setting we see that the rest energy of a body of mass is , as required.

Remark 1 The above derivation did not explicitly use the law of conservation of momentum (other than to observe that the scenario of one mass at rest splitting into two smaller masses moving in equal and opposite directions was compatible with this law). Actually, if one definesthe momentum of a body of mass and velocity by the formula and the momentum of a system as the sum of the momenta of its components, one can use (5) and the Lorentz transformations to (after some algebra) express the total momentum of a system as a linear combination of the total energy of that system viewed in a couple reference frames (or, if one prefers, as the derivatives of the total energy with respect to infinitesimal reference frame changes), and as a consequence one can actually derive the law of conservation of momentum from the law of conservation of energy, together with special relativity. (Actually, this can also be done in Galilean relativity as well, using the classical formula ; we leave this as an exercise to the reader.) Indeed, in special relativity it is natural to unify energy and momentum together as a single quantity known as the four-momentum.

Remark 2 The above arguments ultimately rely on the fact that the Lorentz group has an essentially unique linear action on when the spatial dimension is at least two. For , the group becomes abelian, and there is a multiplicity of such actions (parameterised by the different possibilities for the quantity appearing in (4)), and one could a priorihave a number of different laws relating energy and momentum with mass and velocity that are consistent with special relativity and the conservation laws. Indeed, for any choice of , one could postulate the laws and for the energy and momentum of a body of mass moving at rapidity (i.e. at velocity ). One can verify that such laws are consistent with the laws of conservation of mass and energy, with the postulates of special relativity, and with the Newtonian approximation, as long as one is only in one spatial dimension; one needs to use at least one other dimension to be able to reduce to the case. Thus we see that higher-dimensional relativity is more rigid than one-dimensional relativity. In the case of Einstein’s original argument, the quantum mechanical properties of photons are used instead to show that in the lightspeed limit , which gives the reduction to .