First published Sat Jun 1, 2019

Classical Utilitarianism takes the good to be the expected value of the sum of utilities over time and across the generations (Sidgwick, 1907). Ramsey’s formulation was built on that moral reasoning. He even used the term “enjoyment” to interpret utility. The article embodies the sort of ethical deliberation Sen and Williams (1982) called “Government House Utilitarianism.” But Ramsey’s article thrives today because Government House needs ethical guidance that isn’t a prop for paid Officials to act in nepotistic, never mind predatory ways, but are instead impartial over people’s needs and sensitivities. Although Ramsey used the Utilitarian language, a generous reading of his paper says that much would be gained if, instead of “enjoyment,” we were to work with the broader notion of “well-being.” Such a move allows one to pay greater attention to the factors, be they material or otherwise, that make for flourishing lives.

With the emergence of post-colonial nations following the Second World War, long run economic development became prominent in economic studies. By the early 1960s it had become clear that Ramsey’s paper is the natural starting point for studying the welfare economics of the long run, not only for pursuing optimum development in centrally planned economies (Chakravarty, 1969), but also for use in social cost-benefit analysis of public investment in mixed economies (Arrow and Kurz, 1970), the choice of technology in labour-surplus economies (Little and Mirrlees, 1968, 1974), and more recently, the welfare economics of climate change (Cline, 1992; Nordhaus, 1994; Stern, 2007). The number of trails Ramsey laid was remarkable. In academic economics it is one of the dozen or so most influential papers of the 20th century.

In a remarkable paper Frank Ramsey developed a framework in which each of these questions can be studied in a form that is precise and tractable enough to elicit answers (Ramsey, 1928). His approach was to apply the Classical-Utilitarian calculus to identify the best match from among attainable and desirable utility streams over time and across generations. Although very famous today, the paper had no initial impact. Some economists have attributed the lack of interest to the paper’s technical character. In answering the question he posed (“How much of a nation’s output should it save?”), Ramsey had to use the calculus of variations. There is no question but that few economists then knew the required technicalities. But it is difficult to imagine that there were no economists capable of learning the necessary mathematics had they wished to do so. The reason there was little interest in Ramsey’s paper lay elsewhere. In the years following the publication, a period now known as The Great Depression, the central economic problem in industrialised countries was to find ways to increase employment. Factories lay idle, as did people. The unemployment rate in Europe and USA was in the region of 25%. Policies that were needed then had to do with creating incentives for employers to hire workers. Even though there were controversies among economists on what those policies should be, no one doubted that industrialized societies were facing a problem of the short run. In contrast, Ramsey addressed a question involving the long run; and, so as to have an uncluttered problem to analyse, he took it as given that there is full employment at every date of both capital and labour.

How should we conceptualise human well-being over time and across generations? How ought the interests of people in the distant future to be taken into account when we make our own decisions? How much of its output should a nation invest for the future? In which assets should that investment be made? What should be the balance among private, public, and communitarian investments in the overall investment that a generation makes for the future? How much should the world spend in countering global climate change?

1. Production Possibilities in Ramsey’s Formulation

Ramsey’s goal was practical: “How much of a nation’s output should it save for the future?” The demographic profile over time was taken by him to be given, meaning that future numbers of people were seen as exogenous and predictable. We were therefore to imagine that economic policies have a negligible effect on reproductive behaviour (but see Dasgupta, 1969, for a study of the joint population/saving problem, using Classical Utilitarianism as the guiding principle). Parfit (1984) christened choices involving the same demographic profile, “Same Numbers Choices.”

The ingredients of Ramsey’s theory are individuals’ lifetime well-beings. Government House in his world maximizes the expected sum of the lifetime well-beings of all who are here today and all who will ever be born, subject to resource constraints. The optimum distribution of lifetime well-beings across generations is derived from that maximization exercise. Of course, the passage of time is not the same as the advance of generations. An individual’s lifetime well-being is an aggregate of the flow of well-being she experiences, while intergenerational well-being is an aggregate of the lifetime well-beings of all who appear on the scene. It is doubtful that the two aggregates should have the same functional form. On the other hand, there is little evidence to suggest that we would be way off the mark in assuming they do have the same form. As a matter of practical ethics, it helps enormously to approximate by not distinguishing the functional form of someone’s well-being over time from that of well-being across the generations. Ramsey adopted this short-cut. People were also taken to be identical, so we may as well assume that there is a single individual at each date. The move removes any distinction between time and generations. An alternative interpretation would have us imagine that the economy consists of a single dynasty, where parents in each generation leave bequests for their children (Meade, 1966, adopted this interpretation). Ramsey also assumed, probably because the mathematics is simpler, that time is a continuous variable, not discrete.

Let \(t \ge 0\) denote time. In Ramsey’s model there is no uncertainty (but see Levhari and Srinivasan, 1969, for one of the first of many extensions of the Ramsey model that incorporate uncertainty about future possibilities). The economy is endowed with a single, non-depreciating commodity that can be worked by labour to produce output at each date (Gale, 1967, and Brock, 1973, were among the first of many extensions of the Ramsey model that contain a heterogeneous collection of capital goods). The economy is assumed to be closed to international trade (opening the economy to trade involves only a minor extension to Ramsey’s model). That means some of the output can be invested so as to add to the commodity’s stock while the remainder can be consumed immediately. We call the stock of the commodity that serves to produce output, “capital.” The problem is then to find the optimum allocation of output at each date between consumption and investment.

Ramsey assumed that work is unpleasant. But because including the disutility of work in our account of his work here would add nothing of substance, we suppose that labour supply is an exogenously given constant (e.g., it is independent of the wages labour can demand). That enables us to suppress the supply of labour in both production and the factors affecting well-being.

If \(K\) is the stock of capital of the economy’s one and only commodity, output is taken to be \(F(K)\), where \(F(0) = 0\) (i.e., output is zero if there is no capital), \(dF(K)/dK \gt 0\) (i.e., the marginal product of capital is positive), and \(d^2 F(K)/dK^2 \le 0\) (i.e., the marginal product of \(K\) does not increase with \(K\)). \(F(K)\) is a flow (production at a moment in time), in contrast to \(K\), which is a stock (quantity of capital, period). Notice also that output depends solely on the stock of capital. No mention is made of possible improvements in the quality of capital or labour. Thus, there is no prospect of technological progress or accumulation of human capital in Ramsey’s model (but see Mirrlees, 1967, for one of the first of many extensions of the Ramsey model that include technological advances in production and human capital formation); nor are there any natural resources in the model (but see Dasgupta and Heal, 1974, for one of the first of many extensions of the Ramsey model that include natural capital in production).

Let \(C(t)\) be consumption at \(t\). It is a flow (units of consumption per moment). Similarly, we write \(K(t)\) for the stock of capital at \(t\). As \(dK(t)/dt\) is the rate of change in the capital stock at \(t\), it is “net investment at \(t\),” which too is a flow. And because the capital stock is assumed not to depreciate, gross investment equals net investment.

In Ramsey’s model anticipated output at each moment equals the sum of intended investment and intended consumption. Intentions are always realized. To put it in technical language, the economy is in equilibrium at each moment, which is another way of saying that at each moment intended saving equals intended investment. (The assumption needs no explanation in a model with a single agent, but has real bite in a world where savers are not the same agents as investors.) Capital is assumed to be always fully deployed, and labour (which is hidden in the production function \(F(K)\)) is taken to be fully employed. Output at \(t\) is \(F(K(t))\). It follows that the economy is driven by the dynamical equation

\[\tag{1} \frac{dK(t)}{dt} = F(K(t)) - C(t) \]

Equation (1) says that if consumption is \(C(t)\), investment is what remains of output. So, Ramsey’s problem can be cast equally as, “How much of a nation’s output should it consume?” If consumption is less than output at \(t\) (i.e., \(C(t) \lt F(K(t))\), investment is positive (i.e., \(dK(t)/dt \gt 0)\) and the stock of capital increases; but if consumption exceeds output at \(t\), investment is negative, which means capital is eaten into and the stock declines (i.e., \(dK(t)/dt \lt 0).\) We now imagine that Government House is advised by a “socially-concerned citizen,” the person being someone who is trying to determine the right balance between the economy’s consumption and investment at each date. We shall call that person the decision maker, or DM. Ramsey imagined that DM is a Classical-Utilitarian.

2. The Classical-Utilitarian Calculus

Classical Utilitarianism identifies the good as the expected sum of well-being over time and across generations. Here is Sidgwick (1907: 414) on the matter:

It seems … clear that the time at which a man exists cannot affect the value of his happiness from the universal point of view; and that the interests of posterity must concern a Utilitarian as much as those of his contemporaries, except in so far as the effect of his actions on posterity – and even the existence of human beings to be affected – must necessarily be more uncertain. (Italics added)

To formalize this, we consider an arbitrary date \(t\) at which DM is deliberating. Let \(\tau\) denote dates not earlier than \(t\) (i.e., \(\tau \ge t)\). Ramsey considered a deterministic, infinitely lived world (but see Yaari, 1965, for the first of many extensions of the Ramsey model that incorporate the risk of individual or societal extinction). Well-being is assumed to be a numerical quantity. Let \(U(t)\) be well-being at \(t\), and let \(V(t)\) be an aggregate measure of the flow of well-being across time and generations, as evaluated at time \(t\). Ramsey followed Sidgwick in assuming that

\[\tag{2} V(t) = \int^{\infty}_t[U(\tau)]d\tau \]

\(V(t)\) is intergenerational well-being at \(t\). Because Ramsey’s world is deterministic, \(V(t)\) is also the expected value of \(V(t)\). So Sidgwick’s criterion is the \(V(t)\) in equation (2).

Well-being at any given date is assumed to be a function solely of consumption at that date. We therefore write \(U(t) = U(C(t))\). Ramsey assumed that marginal well-being is positive (i.e., \(dU(C)/dC \gt 0)\) but diminishes with increasing consumption levels (i.e., \(d^2 U(C)/dC^2 \lt 0)\). The latter property implies that \(U(C)\) is a strictly concave function. (Edgeworth, 1885, had routinized the idea that marginal well-being declines with increasing consumption.) Thus equation (2) can be written as

\[\tag{3} V(t) = \int^{\infty}_t [U(C(\tau))]d\tau \]

Classical Utilitarianism, as reflected in equation (3), requires that if \(U\) is a numerical measure of well-being, then so is \(\alpha U+\beta\), where \(\alpha\) is a positive number and \(\beta\) is a number of either sign. Formally, we say that \(U\) is unique up to “positive affine transformations.” We confirm presently that the theory’s recommendations are invariant under such transformations.

2.1 Zero Discounting of Future Well-Beings

In equation (3), future values of \(U\) are not discounted when viewed from the present moment, \(t\). This particular move has provoked more debate among economists and philosophers than any other feature of Ramsey’s theory of optimum saving. The debate has on occasion been shriller than even we economists are used to (see in particular Nordhaus, 2007). At the risk of generalizing wildly, economists have favoured the use of positive rates to discount future well-beings (e.g., Arrow and Kurz, 1970), whereas philosophers have insisted that the well-being of future people should be given the same weight as that of present people (e.g., Parfit, 1984).

What would Classical Utilitarianism with positive discounting of future well-beings look like? Let \(\delta \gt 0\) be the rate at which it is deemed desirable to discount future well-beings (for simplicity we take the discount rate to be constant). Then, in place of equations (2)–(3), intergenerational well-being at \(t\), would read as

\[\tag{4} \begin{align} V(t) &= \int^{\infty}_t [U(\tau)e^{-\delta(\tau -t)}]d\tau \\ &= \int^{\infty}_t [U(C(\tau))e^{-\delta(\tau -t)}]d\tau, t \ge 0 \\ \end{align}\]

In equation (4), \(\delta\) the “time discount rate” and \(e^{-\delta}\) the resulting “time discount factor.”

\(\delta \gt 0\) implies \(e^{-\delta} \lt 1\). That means \(e^{-\delta(\tau -t)}\) tends to zero exponentially as \(\tau\) tends to infinity. In the latter part of his paper Ramsey (1928: 553–555) did use equation (4) to study the problem of optimum saving, but he did not approve of the formulation. Instead, he wrote (p. 543) that to discount later \(U\)’s in comparison with earlier ones is “… ethically indefensible and arises merely from the weakness of the imagination.” In a book that inaugurated the formal study of economic development, Harrod (1948: 40) followed suit by calling the practice a “… polite expression for rapacity and the conquest of reason by passion.”

Strong words, but to some economists, the Ramsey-Harrod stricture in a deterministic world reads like a Sunday pronouncement. Solow (1974a: 9) expressed this feeling exactly when he wrote, “In solemn conclave assembled, so to speak, we ought to act as if the [discount rate on future well-beings] were zero.”

But the matter cannot be settled without a study of production and consumption possibilities open to an economy. Consider the following tension between two sets of considerations:

Low rates of consumption by generations sufficiently far into the future would not be seen to be a bad thing by the current DM if future well-beings were discounted at a positive rate. So today’s DM would recommend high consumption rates for now and the near future even if that meant generations in the distant future would live in penury. But if such a policy were followed, the demands of a further moral requirement to Classical Utilitarianism that DM may hold, namely, “intergenerational equity,” would not be met. Therefore we should follow Ramsey and not discount future well-beings. Write \(dF(K)/dK\) as \(F_K\). From equation (1) it is simple to deduce that \(F_K\) is the rate of return on investment. In Ramsey’s economy \(F_K \gt 0\), which means every unit of output that is saved yields more than a unit of future consumption, other things equal. For example, if DM were to reduce consumption at \(t\) by a unit, the additional consumption that would be available in the briefest of periods later – we write that as \(\Delta t\) – without affecting consumption at any future date would be \(1+[dF(K(t))/dK(t)]\Delta t\). The productivity of capital is thus tied to the arrow of time, which creates a bias in favour of future generations. This bias gives bite to the adage, “We can do something for posterity, but what can posterity ever do for us?” The thought inevitably arises that perhaps the bias should be countered in DM’s calculus if some attention were to be given by her to intergenerational equity in realized well-being as a supplement to Classical Utilitarianism. That in turn suggests that DM should abandon Ramsey and discount future well-beings at a positive rate.

The force of each consideration has been demonstrated in the economics literature. It has been shown in the context of a simple model that if production requires produced capital and exhaustible resources, then optimum consumption declines to zero in the long run if future well-beings are discounted at a positive rate (Dasgupta and Heal, 1974), but increases indefinitely if we follow Ramsey in not discounting future well-beings (Solow, 1974b). The exercises tell us that the long-run features of optimum saving policies depend on the relative magnitudes of the rate at which future well-beings are discounted and the long-term productivity of capital assets.

There is a more general point here, which was explored by Koopmans (1960, 1965, 1967, 1972) in a remarkable set of publications on the idea of economic development. In such complex exercises as those involving consumption and investment over a long time horizon, it is foolish to regard any ethical principle (e.g., Classical Utilitarianism) as sacrosanct. One can never know in advance what it may run up against. A more judicious tactic than Ramsey’s would be to to play off one set of ethical assumptions against another in not-implausible worlds, see what their implications are for the distribution of well-being across generations, and then appeal to our intuitive senses before arguing over policy. Settling ex ante whether to use a positive rate to discount future well-beings could be a self-defeating move.[1]

3. The Problem of Optimum Saving

Ramsey considered a world with an indefinite future. This could appear to be an odd move, but it has a strong rationale. Suppose DM were to choose a horizon of \(T\) years. As she doesn’t know when our world will end, she will want to specify the resources that should be left behind at \(T\) in case the world doesn’t terminate then. But to find a justification for the amount to leave behind at \(T\), DM will need an assessment of the world beyond \(T\). That would, however, amount to including the world beyond \(T\). And so on.

Denote a consumption stream from the present \((t = 0)\) to infinity as \(\{C(t)\}.\) \(K(0) \gt 0\) circumscribes the economy; it is the quantity of capital that society has inherited from the past. Mathematicians would call \(K(0)\) an “initial condition.” The problem Ramsey set himself was to determine the consumption stream \(\{C(t)\}\) from 0 to infinity that DM would select if she were a Classical Utilitarian.

3.1 Undiscounted Utilitarianism

Call a consumption stream \(\{C(t)\}\) feasible if it satisfies equation (1) with initial condition \(K(0)\). In Ramsey’s deterministic world the Classical Utilitarian formulation of the problem of optimum national saving at date \(t = 0\) is thus:

“From the set of all feasible consumption streams, find that \(\{C(t)\}\) which maximizes

\[ V(0) = \int^{\infty}_0 [U(C(t))]dt.” \]

We will call this optimization problem, Ramsey Mark I.

There is a serious difficulty with Ramsey Mark I: it is not coherent. Infinite sums don’t necessarily converge. For any \(\{C(t)\}\) for which the infinite integral doesn’t converge, \(V(0)\) doesn’t exist. If the integral is non-convergent for every feasible consumption streams \(\{C(t)\}\), the maximization problem is meaningless: One cannot maximize something that appears to be a real-valued function \(V(0)\) when in fact the function doesn’t exist.

The force of this observation can be seen in

Example 1 (attributed to David Gale)

Suppose as an extreme special case of the Ramsey economy, \(F(K) = 0\) for all \(K \ge 0\). Then equation (1) reduces to

\[\tag{5} \frac{dK(t)}{dt} = - C(t) \]

The economy described in equation (5) consists of a non-deteriorating piece of cake, of size \(K(0) \gt 0\) at the initial date. It is obvious that every consumption stream \(\{C(t)\}\) satisfying equation (5) tends to zero in the long run. Formally, \(C(t) \rightarrow 0\) as \(t \rightarrow \infty\).

Because the \(U\)-function is unique up to positive affine transformations, we may without any loss of generality normalize it so that \(U(0)

e 0\). It is then obvious that for all feasible \(\{C(t)\}\), \(V(0)\) in Ramsey Mark I diverges to minus infinity if \(U(0) \lt 0\), but diverges to plus infinity if \(U(0) \gt 0\). That an optimum policy does not exist in the cake-eating model can be seen if we now recall that \(U(C)\) has been assumed to be strictly concave. The assumption implies that any non-egalitarian distribution of consumption among the generations can be improved upon by a suitable redistribution. The ideal distribution would be equal consumption for all generations. The only consumption stream with the latter property is \(C(t) = 0\) for all \(t\). But that’s the worst possible distribution. QED

3.2 Re-normalizing Undiscounted Utilitarianism

The question arises whether there are circumstances in which there is a best consumption stream even though \(V(0)\) does not converge for all consumption streams. Ramsey formulated the question by altering the way the saving problem is posed.

Imagine that well-being is bounded above no matter how large consumption happens to be. Let \(U\) be the numerical measure of well-being that DM chooses to work with. (All positive affine transformations of \(U\) would be equally legitimate measures of well-being.) Let \(B\) be the lowest upper bound of \(U\). Ramsey christened it “Bliss”. Because the rate of return on investment \((F_K)\) in his model is positive, consumption would grow indefinitely and tend to infinity in the long run if saving rates were suitably chosen. That means there are possible paths of economic development in which \(U(C(t))\) tend to \(B\) in the long run. But that implies there are possible paths of economic development in which the short-fall of \(U(C(t))\) from \(B\) tends to zero in the long run. If the short-fall tends to zero fast enough, the undiscounted integral of the difference between \(U(C(t))\) and \(B\) would exist, and DM could seek to maximize the modified integral. So we have Ramsey Mark II, which reads as

“From the set of all feasible consumption streams, find that \(\{C(t)\}\) which maximizes

\[ V(0) = \int^{\infty}_0 [U(C(t))-B]dt.” \]

Notice that Mark II is a transformation of Mark I. The transformation amounts to re-normalizing the optimality criterion. Not only was the move from Mark I to Mark II on Ramsey’s part ingenious, it also displayed his moral integrity. It would have been easy enough for him to ask DM instead to discount future consumption and expand the range of circumstances in which Utilitarianism provides an answer to the problem DM is attempting to solve. He chose not to do that.

Ramsey’s intuition in moving from Mark I to Mark II was powerful, but in a paper that initiated the modern literature on the Ramsey problem, Chakravarty (1962) observed that to rely exclusively on the condition Ramsey had identified as being necessary for a consumption stream to be the optimum (see below) can lead to absurd results (see below, Sect. 4). In effect Chakravarty observed that infinite integrals, even when cast in the re-normalized form in Ramsey Mark II, don’t necessarily converge to finite values.

3.3 The Overtaking Criterion

What was needed was to de-link the question whether infinite well-being integrals converge from the question whether optimum consumption streams exist. That insight was provided by Koopmans (1965) and von Weizsacker (1965). The latter author’s re-statement of the problem of optimum saving was as follows:

We say that the feasible consumption stream \(\{C^*(t)\}\) is superior to a feasible consumption stream \(\{C(t)\}\) if there exists \(T \gt 0\) such that for all \(t \ge T\),

\[\tag{6} \int^t_0 [U(C^*(s))]ds \ge \int^t_0 [U(C(s))]ds \]

We call \(\{C^*(t)\}\) optimum if it is superior to all other feasible consumption streams.

The condition that is represented in inequality (6) is known as the Overtaking Criterion (OC), for that is what it is. OC avoids asking whether the integrals on either side of inequality (6) converge as \(t \rightarrow \infty\). If they do, OC reduces to Classical Utilitarianism. But OC is able to respond to Ramsey’s saving problem in a wider class of situations. In his work Koopmans (1965) identified a canonical economic model in which the \(U\)-function is bounded above and in which Ramsey Mark II is equivalent to an optimization problem that is posed in terms of OC.

What are we to make of the ethics of discounting the well-beings of future generations? Ramsey (1928) began by dismissing it but then studied it at the tail end of his paper. DM could of course justify discounting future well-being if there is a possibiliy of future extinction. Sidgwick (1907) himself noted that in the passage quoted earlier. If Classical Utilitarianism is taken to commend the expected sum of well-beings, then the “hazard rate” at date \(t\) (i.e., the probability of extinction at date \(t\) conditional on society surviving until \(t)\) would appear in the expression for expected well-being as a discount rate for well-being at \(t\). The question remains whether Classical Utilitarianism would insist on zero-discounting of future utilities in a deterministic world.

In a remarkable pair of works Koopmans (1960, 1972) exposed internal contradictions in ethical reasoning in a deterministic world in both Ramsey Mark I and Ramsey Mark II. He (and subsequently Diamond, 1965) showed that if relatively weak normative requirements are imposed on the concept of intergenerational well-being in a deterministic world, equal treatment of the \(U\)-function across generations has to be abandoned. We turn to that now.

3.4 Discounted Utilitarianism

It transpires the mathematics is a lot simpler if, instead of assuming time is continuous, time is taken to be discrete. Thus we now assume that \(t = 0,1,2,\ldots\) . Assume also that intergenerational well-being at \(t = 0\) can be measured in terms of a numerical function \(V\). The idea is to require the function, which is defined on infinite well-being streams, to satisfy properties that reflect ethical directives.

Let \(\{U(t)\}\) be an infinite well-being stream, that is, \(\{U(t)\} = (U(0),U(1),\ldots ,U(t),\ldots)\). We say \(V(\{U(t)\})\) is continuous if in an appropriate mathematical sense the values of \(V\) for well-being streams \(\{U(t)\}\) that don’t differ much in the space of \(\{U(t)\}\)s are close to one another. A further condition on the \(V\)-function that is ethically attractive is “monotonicity”. To define the notion let us say a well-being stream is “superior” to another if no generation enjoys less well-being along the former than along the latter and if there is at least one generation that enjoys greater well-being in the former than it does in the latter. We say that \(V\) is monotonic if \(V\) is larger for a well-being stream than it is for another if the former is superior to the latter.

Both properties are attractive. Lexicographic orderings notwithstanding, there are not convincing arguments against continuity. Of course Rawls (1972) placed priority rules and the lexicographic orderings on the objects of interest in his conception of justice that come with them at the centre of his theory, but that has proved to have been one of his most contentious moves. The richness and depth of his analysis would not be lessened if small tradeoffs were admitted between the objects of justice. And it’s hard to find reasons against monotonicity. Even Rawls, whose work was so pointed toward distributive justice, insisted on monotonicity.

But it can be shown that any \(V\)-function that satisfies continuity and monotonicity must have generation discounting built into it. It would seem the real numbers are not rich enough to accommodate infinite well-being streams in a manner that respects continuity and monotonicity while awarding the well-beings of all generations equal weight. Proof of the proposition is in Diamond (1965), and was attributed by the author to Menahem Yaari. So we now introduce positive well-being discounting in the \(V\)-function and formulate Ramsey Mark III.

Return once again to the formulation where time is continuous. As previously, we say a consumption stream \(\{C(t)\}\) is feasible if it satisfies equation (1) with an initial capital stock of \(K(0)\). Ramsey Mark III (Ramsey, 1928, 553–555) is then:

“From the set of all feasible consumption streams, find that \(\{C(t)\}\) which maximizes

\[ V(0) = \int^{\infty}_0 [U(C(t))e^{-\delta t}]dt, \delta \gt 0.” \]

In Mark III the discount rate \(\delta\) is a positive constant. That means the corresponding discount factor \(e^{-\delta}\) is less than 1. The latter in turn can be shown to mean that in a wide range of economic models \(e^{-\delta t}\) tends to zero at so fast a rate that Mark III has an answer.

Let \(\{C^*(t)\}\) be the solution of Ramsey Mark III. Heuristically it is useful to imagine that there is a DM at each date. The measure of intergenerational well-being for the DM at date \(t\) is the \(V(t)\) of equation (4). Notice that the ethical views of the successive DMs are congruent with one another. There is thus no need for the DMs to draw up an “intergenerational contract”. The DM at every date will want to choose the level of consumption it deems to be optimum, aware that succeeding DMs will choose in accordance with what she had planned for them. In modern game theoretic parlance, Ramsey’s optimum consumption stream \(\{C^*(t)\}\) is a “non-cooperative” (Nash) equilibrium among the DMs.

4. The Ramsey Rule and Its Ramifications

We now construct an informal version of the variational argument Ramsey used for determining \(\{C^*(t)\}\) in Mark III. Loosely speaking, the DMs require the marginal rate of ethically indifferent substitution between consumption at any two brief periods of time to equal the marginal rate at which consumption can be transformed between those same pair of brief periods of time. Their equality (i.e., the right balance from among the “desirables” and the “feasibles”) is a necessary property of an optimum consumption stream.

Ramsey constructed a mathematical expression of the property, but did not look for conditions that, taken together, are both necessary and sufficient. We will use a simple example, which is also in his paper, to show how a sufficient condition can be obtained.

4.1 The Variational Argument

Write \(dU/dC = U_C\) and \(d^2 U/dC^2 = U_{CC}.\) Let \(\{C(t)\}\) be a feasible consumption stream. We first deduce a formal expression for the marginal rate of ethically indifferent substitution between consumption at any two brief periods of time. Suppose the intention is to reduce consumption at some future date \(t\) by a small quantity \(\Delta C(t)\) and raise consumption at a nearby date \(t+\Delta t\) while keeping consumption at all other dates the same as in \(\{C(t)\}\). The loss in well-being that would follow from the move is \(e^{-\delta t}U_{C(t)}\Delta C(t)\). We now seek to determine the percentage increase in consumption that would be required at \(t+\Delta t\) if \(V(0)\) is to remain unchanged; because that’s the marginal rate of ethically indifferent substitution between consumption at \(t\) and consumption at \(t+\Delta t\). Denote that rate by \(\varrho(t)\). Then \(\varrho(t)\) must be the percentage rate at which discounted marginal well-being declines at \(t\). It also follows that \(\varrho(t)\) is the rate the DM at \(t = 0\) would use to discount a unit of consumption at \(t\) so as to bring it to the present (because that’s what is meant by the percentage rate at which discounted marginal well-being declines at \(t\) – for a formal demonstration, see Dasgupta, 2008). Some economists call \(\varrho(t)\) the consumption rate of interest (Little and Mirrlees, 1974), others call it the social rate of discount (Arrow and Kurz, 1970). \(\varrho(t)\) is a fundamental object in social cost-benefit analysis.

Let \(\Delta\) be vanishingly small. Then, by definition

\[\tag{7} \varrho(t) = - [d(e^{-\delta t}U_{C(t)})/dt]/e^{-\delta t}U_{C(t)} \]

So as to simplify the notation let \(g(C(t))\) denote the percentage rate of growth in \(C(t)\) (i.e. \(g(C(t)) = [dC(t)/dt]/C(t)\), which can be negative), and let \(\sigma(C)\) denote the elasticity of marginal well-being (i.e., \(\sigma(C) = -CU_{CC}/U_C \gt 0)\). Equation (7) then simplifies to

\[\tag{8} \varrho(t) = \delta + \sigma(C(t))g(C(t)) \]

Because \(\{C^*(t)\}\) is by assumption the optimum, no feasible deviation from \(\{C^*(t)\}\) can increase \(V(0)\). That means the consumption rate of interest \((\varrho(t))\) must equal the social rate of return on investment \((F_{K(t)})\) at every \(t\). To see why, suppose in some vanishingly small interval of time \(F_{K(t)} \gt \varrho(t)\). Then \(V(0)\) could be increased by consuming a unit less at \(t\) and enjoying the return of \((1+F_{K(t)})\) soon after. Alternatively, if \(F_{K(t)} \lt \varrho(t), V(0)\) could be increased by consuming a unit more at \(t\) and reducing consumption soon after by an amount equal to the return \((1+F_{K(t)})\). But that means the consumption rate of interest \(\varrho(t)\) equals the social rate of return \(F_{K(t)}\) along \(\{C^*(t)\}\) at every date. Using equation (8) we have,

\[\tag{9} \delta + \sigma(C(t))g(C(t)) = F_{K(t)} \]

Equation (9) is the Ramsey Rule. It is a necessary condition for optimality in Ramsey Mark III and is unarguably the most famous equation in intertemporal welfare economics. The rule is a formal statement of the requirement of \(\{C^*(t)\}\), that the marginal rate of substitution between consumption at two nearby dates (the left-hand-side of eq. 9) equals the marginal rate of transformation between consumption at those same pair of nearby dates (the right-hand-side of eq. (9). It is simple to confirm that equation (9) is invariant under positive affine transformations of the \(U\)-function.

4.2 Incompleteness in Ramsey’s Analysis

Presently we will specify a \(U\)-function for which \(\sigma\) is independent of \(C\). For the moment we merely suppose that \(\sigma\) is constant. In that case the Ramsey Rule reads as

\[\tag{10} \delta + \sigma g(C(t)) = F_{K(t)} \]

In Ramsey Mark III, \(K(0)\) is given as an inheritance from the past. That means \(F_{K(0)}\) is given as an initial condition, it is not a choice for the DM at \(t = 0\). Moreover \(\delta\) and \(\sigma\) are parameters, both reflecting ethical values. The DM can therefore determine \(g(C(0))\) from equation (10). But that’s the optimum percentage rate of growth in consumption at the initial date. The Ramsey Rule gives the DM an equation for determining the initial growth rate of consumption, but it does not say what the initial level of consumption ought to be. Below we show by way of an example that there are an infinity of feasible consumption paths satisfying the Ramsey Rule. It follows that the DM at \(t = 0\) needs a further condition to determine \(C^*(0)\).

Example 2 (the linear economy)

Assume

\[\begin{align} \tag{11a} F(K) &= \mu K, \mu \gt 0 \\ \tag{11b} U(C) &= - C^{-(\sigma -1)}, \sigma \gt 1 \end{align}\]

From equation (11a) it follows that \(F_K = \mu\), which means the rate of return on investment is constant. From equation (11b) it follows that \(\sigma\) is the elasticity of marginal well-being. Notice also that \(U(C) \rightarrow -\infty\) as \(C \rightarrow 0\) and that, under the chosen normaliztion of the \(U\)-function, \(U(C) \rightarrow 0\) as \(C \rightarrow \infty\). Using equation (11a) in equation (1) yields,

\[\tag{12} \frac{dK(t)}{dt} = \mu K(t) - C(t) \]

Write \(m = (\mu -\delta)/\sigma\). Applying equations (11a–b) to equation (10) reduces the Ramsey Rule to

\[\tag{13} \frac{dC(t)}{dt} = [(\mu - \delta)/\sigma]C(t) = mC(t) \]

Equation (13) says that if \(\mu \lt \delta , C(t)\) declines to 0 at an exponential rate. Empirically, the pausible case to consider is \(\mu \gt \delta\), which is what we shall do here. It means that the rate of return on investment \((\mu)\) exceeds the rate at which time is discounted \((\delta)\). And that in turn means \(m \gt 0\). Integrating equation (13) yields

\[\tag{14} C(t) = C(0)e^{mt} \]

Equation (14) says \(C(t)\) grows exponentially at the rate \(m\). We reconfirm a point that was made previously, that although equation (14) reveals the rate of growth optimum consumption at the initial date (i.e., \(t = 0)\), it doesn’t reveal the initial level of consumption (i.e., \(C(0)\)). That’s the indeterminacy in the Ramsey Rule.

The simplest way to determine the optimum initial consumption, \(C^*(0)\), is to observe from equation (14) that if \(C^*(t)\) grows indefinitely at the rate \(m\), so should \(K(t)\) be required to grow at that same rate. The reason is that if the growth rate of \(K(t)\) were to be less than \(m\), capital would be eaten into, which means the stock would be exhausted in finite time. The economy would then cease to exist \((V(0)\) would be minus infinity if the future trajectory of the economy were to be thus.) If on the other hand the growth rate of \(K(t)\) were to exceed \(m\), there would be over-accumulation of capital, in the sense that consumption would be lower at every date than it needs be. The situation would resemble one where DM throws away a part of the initial capital stock \(K(0)\) and then settles on a saving behaviour that satisfies the Ramsey Rule.

Exponential growth in our linear economy (eq. 11a) tells us that the saving rate should be constant. Let us define the saving rate, \(s\), as the proportion of output (GDP) that is invested at each instant. Then equation (1) can be re-written as

\[\tag{15} \frac{dK(t)}{dt} = s\mu K(t) \]

Equation (15) says that intended saving equals intended investment. Integrating equation (15) yields

\[\tag{16} K(t) = K(0)e^{s\mu t} \]

But we are insisting that both \(K(t)\) and \(C(t)\) should grow at the same rate. Equations (14) and (16) therefore imply

\[\tag{17} m = \frac{\mu -\delta}{\sigma} = s\mu \]

The saving rate in equation (17) is the optimum. So we write it as \(s^*\). Thus

\[\tag{18} s^* = \frac{m}{\mu} = \frac{\mu -\delta}{\sigma\mu} \lt 1 \]

Equations (16)–(18) tell us that the optimum rate of growth of consumption, \(g^*\), is

\[\tag{19} g^* = \frac{\mu -\delta}{\sigma} \gt 0 \]

Notice also that if \(\delta = 0\), equation (18) reduces to

\[\tag{20} s^* = \frac{1}{\sigma} \]

Equation (20) offers as elegant a simplified answer as there could be to the question with which Ramsey started his paper.

4.3 The Transversality Condition

The linear technology (eq. 11a) and the iso-elastic \(U\)-function (eq. 11b) allowed us to recognise immediately that if a consumption stream satisfying the Ramsey Rule is to be the optimum, both capital and consumption should grow at the same exponential rate, \(m\). Identifying a sufficient condition for optimality in more general models is a lot more difficult. What we need is a condition on the long-run features of a consumption stream satisfying the Ramsey Rule that can ensure it is the optimum. von Weizsacker (1965) showed that the required condition relates to the long-run behaviour of the social value of capital associated with that consumption stream. We now formalize the condition.

Let \(U\) be the unit of account. Consider a consumption stream \(\{C(t)\}\). It follows that \(U_{C(t)}\) is the social worth of a marginal unit of consumption. Write \(P(t)\) for \(U_{C(t)}. P(t)\) is called the (spot) accounting price of consumption. Because \(e^{-\delta t}P(t)\) is the discounted value of \(P(t)\), it is called the present-value accounting price of consumption. If \(\{C(t)\}\) satisfies the Ramsey Rule in Mark III, \(e^{-\delta t}P(t)\) is also the present-value accounting price of a unit of capital stock. von Weizsacker (1965) showed that a sufficient condition for the optimality of \(\{C(t)\}\) is \(e^{-\delta t}P(t)K(t) \rightarrow A\) as t \(\rightarrow \infty\), where \(A\) is a (finite) non-negative number. In words, a necessary and sufficient condition for \(\{C(t)\}\) to be the optimum is (i) that it satisfies the Ramsey Rule, and (ii) that the present-value of the economy’s stock of capital is finite. Condition (ii), which is widely known as the “transversality condition,” eliminates those feasible consumption streams that satisfy the Ramsey Rule but along which there is excessive saving. A simple calculation confirms that in Example 2 the transversality condition is satisfied if the saving rate is \(s^*\) (eq. 18).

4.4 Numerical Estimates of the Optimum Rate of Saving

Equation (18) says that \(s^*\) is an increasing function of the return on investment \((\mu)\), a decreasing function of the time rate of discount \((\delta)\), and a decreasing function of the elasticity of marginal well-being \((\sigma)\). Each of these properties is intuitively obvious:

(1) The higher is the rate of return on investment \((\mu)\), the greater is the gain to future generations from a marginal increase in saving by initial generations. That says the optimum rate of saving should be an increasing function of \(\mu\), other things equal.

(2) The larger is the value of the time discount rate \((\delta)\) chosen by DM, the lower is the weight that she awards to the well-being of future generations. That implies higher optimum consumption levels for early generations (Sect. 2.1), which in turn implies that the optimum rate of saving is lower, other things equal.

(3) As the return on investment is positive \((\mu \gt 0)\), the arrow of time displays a bias in favour of future generations (Sect. 2.1). But the larger is the chosen value of \(\sigma\), the more DM displays concerns over equity in consumption across the generations. Therefore, the larger is that concern, the higher is the optimum rate of consumption to be enjoyed by initial generations. So we should expect the optimum rate of saving to be a decreasing function of \(\sigma\), other things equal.

It is instructive to consider stylized figures for the parameters on the right-hand-sides of equations (18) and (19), respectively. Although stylized, they are figures for the pair of ethical parameters \(\sigma\) and \(\delta\) that economists who have written on the economics of climate change have assumed in their work. To be sure, the welfare economics of climate change has demanded more complicated models than the model that is represented in equations (1) and (11a), but as we confirm below, it has not offered any additional theoretical insights. In what follows we take a year to be the unit of time and assume that \(\mu = 0.05\) (i.e., 5% a year). Along the optimum, the consumption rate of interest equals the rate of return on investment (the Ramsey Rule), which means that the optimum consumption rate of interest equals a constant 5% a year.

A figure of 5% a year for \(\mu\) implies a capital-output ratio \((1/\mu)\) of 20 years, which is far higher than the estimates of capital-output ratios from inter-industry studies that economists in various parts of the world have arrived at (Behrman, 2001); a representative figure for 1/\(\mu\) in that literature is 3 years. But their estimates have been based on a definition of “capital” that is confined to “produced” capital, such as factories, roads, ports, and buildings. Human capital (education, health, knowledge) is missing from them, as is natural capital (ecosystems, sub-soil resources). Ramsey’s model, as encapsulated in equation (11a), embraces all forms of capital goods. No doubt his formulation requires a heroic (read, impossible!) feat of aggregation, but when all capital goods that enter production are taken into account, we should expect an aggregate capital-output ratio (which we should call the (inclusive) wealth-output ratio), to be a lot higher than 3 years; perhaps even higher than 20 years (Arrow et al., 2012, 2013). Large categories of capital goods are absent from the national economic accounts that inform we economists’ understanding of production and consumption possibilities (Dasgupta, 2019). It would thus seem there is still a long way to go before we can reach a good approximation of what we should bequeath to our descendants.

Example 3 (taken from the economics of climate change)

We now turn our attention to the values of the two ethical parameters in equation (11b) that were chosen by three economists in their study of the economics of climate change.

\[\begin{align} \tag*{Cline (1992)} \sigma = 1.5 \quad &\text{and} \quad \delta = 0 \\ \tag*{Nordhaus (1994)} \sigma = 1 \quad &\text{and} \quad \delta = 0.03 \text{ (3% a year)} \\ \tag*{Stern (2007)} \sigma = 1 \quad &\text{and} \quad \delta = 0.001 \text{ (0.1% a year)} \end{align}\]

(NB: \(\sigma = 1\) corresponds to the logarithmic well-being function, that is \(U(C) =\) log\(C\), and can be obtained as a limit of the functional form of \(U(C)\) in equation (11b) as \(\sigma \rightarrow 1.)\)

We impose those parameter values to find that the optimum saving rate \(s^*\) (eq. 18) and the optimum rate of growth of consumption (eq. 19) are, in turn:

4.5 Commentary

\[\begin{align} \tag{21a} s^* = 67\% \quad &\text{and} \quad g^* = 3.3\% \text{ a year (Cline)} \\ \tag{21b} s^* = 40\% \quad &\text{and} \quad g^* = 2.0\% \text{ a year (Nordhaus)} \\ \tag{21c} s^* = 98\% \quad &\text{and} \quad g^* = 4.9\% \text{ a year (Stern)} \end{align}\]

A national saving rate of 40% (eq. 21b) is no doubt high by the standards of contemporary western economies, but there are countries that in recent years have achieved 40–45% saving rates (China is a prominent example). A figure of 67% for \(s^*\) (eq. 21a) is higher than the saving rate in any country, but is not beyond belief. The truly outlandish figures is 98% (eq. 21c). It is outlandish especially because the figure is the optimum saving rate no matter how small \(K(0)\) happens to be. Admittedly, the model here (eqs. 11a–b) is phenomenally stylized, but it does bring out sharply the observation of Koopmans (1965), that it is foolish to assume \(\delta = 0\) (or close to 0) without first checking its possible consequences for the distribution of well-being across the generations.

Equation (19) has shown that the optimum growth rate of consumption is bounded above by \(\mu\), which explains why \(g^*\) is less than 5% a year for each of the three parametric specifications we have considered. The specifications come from three studies in the welfare economics of global climate change, in which the authors worked with models that are a lot more complex than Ramsey’s. And yet their findings are exactly what his formulation would point to (Dasgupta, 2008), namely, that other things equal, the lower is the chosen value of \(\delta\) and/or the larger is the damage to future well-being that is expected to be caused by global climate change, the greater is the investment level DM should recommend to avert climate change or soften the effects of that change on human well-being. The often shrill debate (e.g., Nordhaus, 2007) on the extent to which global investment should be directed at reducing the deletarious effects of climate change was spurred by differences in model specification among climate-change economists.

The linear technology (eq. 11a) and the iso-elastic \(U\)-function (eq. 11b), when taken together, have offered deep insights even though we have restricted the discussion to pen-and-paper calculations here. The functional forms are not believable; nevertheless, Ramsey made use of them. His paper showed that unbelievably simplified models, provided their construction is backed by strong intuition, can illuminate questions that are seemingly impossible to frame, let alone to answer quantitatively. That has been Ramsey’s enduring gift to theoretical economics.