The first step, just building a database, was no trivial task. The 20,000 records each contained 212 individual variables. “Imagine you put all that in Excel spreadsheet. It’s going to be a mind boggling mess,” says Stern. Since they were working with health records, they also needed to be in compliance with HIPAA.

Stern’s then graduate student Natalie Lira, now a professor of Latino/Latina studies at the University of Illinois Urbana-Champaign, headed up that effort. She ended up collaborating with non-historians from University of Michigan’s School of Public Health to use Redcap, a data capture tool typical in clinical trials. Over three years, a team of undergraduate and graduate students entered the data off of the microfilm into a searchable database.

Asking the question of who is Hispanic turned out to be complicated too. The forms did not have a line for ethnicity; instead, they asked for “nativity,” where someone was originally born. A man of Mexican descent could be born in California, so nativity would not correlate with what we currently think of as Hispanic. When Nicole Novak, an epidemiologist who also worked on the project, went to look for census data on Hispanics living in California, she encountered more confusion. Mexicans, for example, where considered their own race in 1930, white in 1940 and after, though states in the Southwest had their own category for “white person of Spanish surname.” Looking at historic records, says Novak, “has shed a lot of light on how constructed a lot of categories we use in public health are.” The team ended up using Spanish surnames as a proxy for Hispanic ethnicity, despite the imperfect correlation. (Filipinos, for example, also have Spanish surnames.) Eventually, they found that patients with Spanish surnames were indeed two-and-a-half times as likely to be sterilized than those without.

The “nativity” classification posed another question. Should the team use the original outdated terminology on the forms or attempt to update them to our modern language? “Do you use ‘dementia praecox,’ which is roughly equivalent to schizophrenia today? Do you use the word ‘Negro?’” says Stern. In the end, they ended up using the original terms. As a historian, Stern is very much aware of the pitfalls of using interpreting the past through a modern lens. “One can be seduced by big data,” she says. “ You have to precede with caution.” To use this database is to interpret variables set a century ago.

These issues are on Stern’s mind now because the team is working to make the database available to other researchers. It also requires a delicate balance of privacy. Many of the records are now old enough that they are publicly available in California’s state archives in Sacramento. (The microfilm is the state archives is actually the duplicate set Stern made because, remember, the originals were lost.) But is that the same thing as putting them online for anyone to search? Especially if hundreds of these patients, as the recent paper suggests, are still alive?