A great announcement last week, as Dave Donaldson, an economic historian and trade economist, has won the 2017 John Bates Clark medal! This is an absolutely fantastic prize: it is hard to think of any young economist whose work is as serious as Donaldson’s. What I mean by that is that in nearly all of Donaldson’s papers, there is a very specific and important question, a deep collection of data, and a rigorous application of theory to help identify the precise parameters we are most concerned with. It is the modern economic method at its absolute best, and frankly is a style of research available to very few researchers, as the specific combination of theory knowledge and empirical agility required to employ this technique is very rare.

A canonical example of Donaldson’s method is his most famous paper, written back when he was a graduate student: “The Railroads of the Raj”. The World Bank today spends more on infrastructure than on health, education, and social services combined. Understanding the link between infrastructure and economic outcomes is not easy, and indeed has been a problem that has been at the center of economic debates since Fogel’s famous accounting on the railroad. Further, it is not obvious either theoretically or empirically that infrastructure is good for a region. In the Indian context, no less a sage than the proponent of traditional village life Mahatma Gandhi felt the British railroads, rather than help village welfare, “promote[d] evil”, and we have many trade models where falling trade costs plus increasing returns to scale can decrease output and increase income volatility.

Donaldson looks at the setting of British India, where 67,000 kilometers of rail were built, largely for military purposes. India during the British Raj is particularly compelling as a setting due to its heterogeneous nature. Certain seaports – think modern Calcutta – were built up by the British as entrepots. Many internal regions nominally controlled by the British were left to rot via, at best, benign neglect. Other internal regions were quasi-independent, with wildly varying standards of governance. The most important point, though, is that much of the interior was desperately poor and in a de facto state of autarky: without proper roads or rail until the late 1800s, goods were transported over rough dirt paths, leading to tiny local “marketing regions” similar to what Skinner found in his great studies of China. British India is also useful since data on goods shipped, local on weather conditions, and agricultural prices were rigorously collected by the colonial authorities. Nearly all that local economic data is in dusty tomes in regional offices across the modern subcontinent, but it is at least in principle available.

Let’s think about how many competent empirical microeconomists would go about investigating the effects of the British rail system. It would be a lot of grunt work, but many economists would spend the time collecting data from those dusty old colonial offices. They would then worry that railroads are endogenous to economic opportunity, so would hunt for reasonable instruments or placebos, such as railroads that were planned yet unbuilt, or railroad segments that skipped certain areas because of temporary random events. They would make some assumptions on how to map agricultural output into welfare, probably just restricting the dependent variable in their regressions to some aggregate measure of agricultural output normalized by price. All that would be left to do is run some regressions and claim that the arrival of the railroad on average raised agricultural income by X percent. And look, this wouldn’t be a bad paper. The setting is important, the data effort heroic, the causal factors plausibly exogenous: a paper of this form would have a good shot at a top journal.

When I say that Donaldson does “serious” work, what I mean is that he didn’t stop with those regressions. Not even close! Consider what we really want to know. It’s not “What is the average effect of a new railroad on incomes?” but rather, “How much did the railroad reduce shipping costs, in each region?”, “Why did railroads increase local incomes?”, “Are there alternative cheaper policies that could have generated the same income benefit?” and so on. That is, there are precise questions, often involving counterfactuals, which we would like to answer, and these questions and counterfactuals necessarily involve some sort of model mapping the observed data into hypotheticals.

Donaldson leverages both reduced-form, well-identified evidence, and that broader model we suggested was necessary, and does so with a paper which is beautifully organized. First, he writes down an Eaton-Kortum style model of trade (Happy 200th Birthday to the theory of comparative advantage!) where districts get productivity draws across goods then trade subject to shipping costs. Consider this intuition: if a new rail line connect Gujarat to Bihar, then the existence of this line will change Gujarat’s trade patterns with every other state, causing those other states to change their own trade patterns, causing a whole sequence of shifts in relative prices that depend on initial differences in trade patterns, the relative size of states, and so on. What Donaldson notes is that if you care about welfare in Gujarat, all of those changes only affect Gujaratis if they affect what Gujaratis end up consuming, or equivalently if it affects the real income they earn from their production. Intuitively, if pre-railroad Gujarat’s local consumption was 90% locally produced, and after the railroad was 60% locally produced, then declining trade costs permitted the magic of comparative advantage to permit additional specialization and hence additional Ricardian rents. This is what is sometimes called a sufficient statistics approach: the model suggests that the entire effect of declining trade costs on welfare can be summarized by knowing agricultural productivity for each crop in each area, the local consumption share which is imported, and a few elasticity parameters. Note that the sufficient statistic is a result, not an assumption: the Eaton-Kortum model permits taste for variety, for instance, so we are not assuming away any of that. Now of course the model can be wrong, but that’s something we can actually investigate directly.

So here’s what we’ll do: first, simply regress time and region dummies plus a dummy for whether rail has arrived in a region on real agricultural production in that region. This regression suggests a rail line increases incomes by 16%, whereas placebo regressions for rail lines that were proposed but canceled see no increase at all. 16% is no joke, as real incomes in India over the period only rose 22% in total! All well and good. But what drives that 16%? Is it really Ricardian trade? To answer that question, we need to estimate the parameters in that sufficient statistics approach to the trade model – in particular, we need the relative agricultural productivity of each crop in each region, elasticities of trade flows to trade costs (and hence the trade costs themselves), and the share of local consumption which is locally produced (the “trade share”). We’ll then note that in the model, real income in a region is entirely determined by an appropriately weighted combination of local agricultural productivity and changes in the weighted trade share, hence if you regress real income minus the weighted local agricultural productivity shock on a dummy for the arrival of a railroad and the trade share, you should find a zero coefficient on the rail dummy if, in fact, the Ricardian model is capturing why railroads affect local incomes. And even more importantly, if we find that zero, then we understand that efficient infrastructure benefits a region through the sufficient statistic of the trade share, and we can compare the cost-benefit ratio of the railroad to other hypothetical infrastructure projects on the basis of a few well-known elasticities.

So that’s the basic plot. All that remains is to estimate the model parameters, a nontrivial task. First, to get trade costs, one could simply use published freight rates for boats, overland travel, and rail, but this wouldn’t be terribly compelling; bandits, and spoilage, and all the rest of Samuelson’s famous “icebergs” like linguistic differences raise trade costs as well. Donaldson instead looks at the differences in origin and destination prices for goods produced in only one place – particular types of salt – before and after the arrival of a railroad. He then uses a combination of graph theory and statistical inference to estimate the decline in trade costs between all region pairs. Given massive heterogeneity in trade costs by distance – crossing the Western Ghats is very different from shipping a boat down the Ganges! – this technique is far superior to simply assuming trade costs linear in distance for rail, road, or boat.

Second, he checks whether lowered trade costs actually increased trade volume, and at what elasticity, using local rainfall as a proxy for local productivity shocks. The use of rainfall data is wild: for each district, he gathers rainfall deviations for the sowing to harvest times individually for each crop. This identifies the agricultural productivity distribution parameters by region, and therefore, in the Eaton-Kortum type model, lets us calculate the elasticity of trade volume to trade shocks. Salt shipments plus crop-by-region specific rain shocks give us all of the model parameters which aren’t otherwise available in the British data. Throwing these parameters into the model regression, we do in fact find that once agricultural productivity shocks and the weighted trade share are accounted for, the effect of railroads on local incomes are not much different from zero. The model works, and note that real incomes changes based on the timing of the railroad were at no point used to estimate any of the model parameters! That is, if you told me that Bihar had positive rain shocks which increased output on their crops by 10% in the last ten years, and that the share of local production which is eaten locally went from 60 to 80%, I could tell you with quite high confidence the change in local real incomes without even needing to know when the railroad arrived – this is the sense in which those parameters are a “sufficient statistic” for the full general equilibrium trade effects induced by the railroad.

Now this doesn’t mean the model has no further use: indeed, that the model appears to work gives us confidence to take it more seriously when looking at counterfactuals like, what if Britain had spent money developing more effective seaports instead? Or building a railroad network to maximize local economic output rather than on the basis of military transit? Would a noncolonial government with half the resources, but whose incentives were aligned with improving the domestic economy, have been able to build a transport network that improved incomes more even given their limited resources? These are first order questions about economic history which Donaldson can in principle answer, but which are fundamentally unavailable to economists who do not push theory and data as far as he was willing to push them.

The Railroads of the Raj paper is canonical, but far from Donaldson’s only great work. He applies a similar Eaton-Kortum approach to investigate how rail affected the variability of incomes in India, and hence the death rate. Up to 35 million people perished in famines in India in the second half of the 19th century, as the railroad was being built, and these famines appeared to end (1943 being an exception) afterwards. Theory is ambiguous about whether openness increases or decreases the variance of your welfare. On the one hand, in an open economy, the price of potatoes is determined by the world market and hence the price you pay for potatoes won’t swing wildly up and down depending on the rain in a given year in your region. On the other hand, if you grow potatoes and there is a bad harvest, the price of potatoes won’t go up and hence your real income can be very low during a drought. Empirically, less variance in prices in the market after the railroad arrives tends to be more important for real consumption, and hence for mortality, than the lower prices you can get for your own farm goods when there is a drought. And as in the Railroads of the Raj paper, sufficient statistics from a trade model can fully explain the changes in mortality: the railroad decreased the effect of bad weather on mortality completely through Ricardian trade.

Leaving India, Donaldson and Richard Hornbeck took Fogel’s intuition that the the importance of the railroad to the US depends on trade that is worthwhile when the railroad exists versus trade that is worthwhile when only alternatives like better canals or roads exist. That is, if it costs $9 to ship a wagonful of corn by canal, and $8 to do the same by rail, then even if all corn is shipped by rail once the railroad is built, we oughtn’t ascribe all of that trade to the rail. Fogel assumed relationships between land prices and the value of the transportation network. Hornbeck and Donaldson alternatively estimate that relationship, again deriving a sufficient statistic for the value of market access. The intuition is that adding a rail link from St. Louis to Kansas City will also affect the relative prices, and hence agricultural production, in every other region of the country, and these spatial spillovers can be quite important. Adding the rail line to Kansas City affects market access costs in Kansas City as well as relative prices, but clever application of theory can still permit a Fogel-style estimate of the value of rail to be made.

Moving beyond railroads, Donaldson’s trade work has also been seminal. With Costinot and Komunjer, he showed how to rigorously estimate the empirical importance of Ricardian trade for overall gains from trade. Spoiler: it isn’t that important, even if you adjust for how trade affects market power, a result seen in a lot of modern empirical trade research which suggests that aspects like variety differences are more important than Ricardian productivity differences for gains from international trade. There are some benefits to Ricardian trade across countries being relatively unimportant: Costinot, Donaldson and Smith show that changes to what crops are grown in each region can massively limit the welfare harms of climate change, whereas allowing trade patterns to change barely matters. The intuition is that there is enough heterogeneity in what can be grown in each country when climate changes to make international trade relatively unimportant for mitigating these climate shifts. Donaldson has also rigorously studied in a paper with Atkin the importance of internal rather than international trade costs, and has shown in a paper with Costinot that economic integration has been nearly as important as productivity improvements in increasing the value created by American agriculture over the past century.

Donaldson’s CV is a testament to how difficult this style of work is. He spent eight years at LSE before getting his PhD, and published only one paper in a peer reviewed journal in the 13 years following the start of his graduate work. “Railroads of the Raj” has been forthcoming at the AER for literally half a decade, despite the fact that this work is the core of what got Donaldson a junior position at MIT and a tenured position at Stanford. Is it any wonder that so few young economists want to pursue a style of research that is so challenging and so difficult to publish? Let us hope that Donaldson’s award encourages more of us to fully exploit both the incredible data we all now have access to, but also the beautiful body of theory that induces deep insights from that data.