Automated Linking of Historical Data Ran Abramitzky Leah Platt Boustan Katherine Eriksson James J. Feigenbaum Santiago Pérez NBER Working Paper No. 25825

Issued in May 2019, Revised in June 2020

NBER Program(s):Development of the American Economy, Technical Working Papers

The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods. You may purchase this paper on-line in .pdf format from SSRN.com ($5) for electronic delivery. Access to NBER Papers You are eligible for a free download if you are a subscriber, a corporate associate of the NBER, a journalist, an employee of the U.S. federal government with a ".GOV" domain name, or a resident of nearly any developing country or transition economy. If you usually get free papers at work/university but do not at home, you can either connect to your work VPN or proxy (if any) or elect to have a link to the paper emailed to your work email address below. The email address must be connected to a subscribing college, university, or other subscribing institution. Gmail and other free email addresses will not have access. E-mail:

Acknowledgments Machine-readable bibliographic record - MARC, RIS, BibTeX Document Object Identifier (DOI): 10.3386/w25825