Vaccination programs for children have prevented more than 100 million cases of serious contagious disease in the United States since 1924, according to a new study published in The New England Journal of Medicine.

The research, led by scientists at the University of Pittsburgh’s graduate school of public health, analyzed public health reports going back to the 19th century. The reports covered 56 diseases, but the article in the journal focused on seven: polio, measles, rubella, mumps, hepatitis A, diphtheria and pertussis, or whooping cough.

Photo

Researchers analyzed disease reports before and after the times when vaccines became commercially available. Put simply, the estimates for prevented cases came from the falloff in disease reports after vaccines were licensed and widely available. The researchers projected the number of cases that would have occurred had the pre-vaccination patterns continued as the nation’s population increased.

The journal article is one example of the kind of analysis that can be done when enormous data sets are built and mined. The project, which started in 2009, required assembling 88 million reports of individual cases of disease, much of it from the weekly morbidity reports in the library of the Centers for Disease Control and Prevention. Then the reports had to be converted to digital formats.

Most of the data entry — 200 million keystrokes — was done by Digital Divide Data, a social enterprise that provides jobs and technology training to young people in Cambodia, Laos and Kenya.

Still, data entry was just a start. The information was put into spreadsheets for making tables, but was later sorted and standardized so it could be searched, manipulated and queried on the project’s website.

“Collecting all this data is one thing, but making the data computable is where the big payoff should be,” said Dr. Irene Eckstrand, a program director and science officer for the N.I.H.’s Models of Infectious Disease Agent Study.

The University of Pittsburgh researchers also looked at death rates, but decided against including an estimate in the journal article, largely because death certificate data became more reliable and consistent only in the 1960s, the researchers said.

Photo

But Dr. Donald S. Burke, the dean of Pittsburgh’s graduate school of public health and an author of the medical journal article, said that a reasonable projection of prevented deaths based on known mortality rates in the disease categories would be three million to four million.

The scientists said their research should help inform the debate on the risks and benefits of vaccinating American children.

Pointing to the research results, Dr. Burke said, “If you’re anti-vaccine, that’s the price you pay.”

The medical journal article notes the recent resurgence of some diseases as some parents have resisted vaccinating their children. For example, the worst whooping cough epidemic since 1959 occurred last year, with more than 38,000 reported cases nationwide.

The disease data is on the project’s website, available for use by other researchers, students, the news media and members of the public who may be curious about the outbreak and spread of a particular disease. Much of the data is searchable by disease, year and location. The project was funded by the National Institutes of Health and the Bill and Melinda Gates Foundation.

Photo

“I’m very excited to see what people will find in this data, what patterns and insights are there waiting to be discovered,” said Dr. Willem G. van Panhuis, an epidemiologist at Pittsburgh and lead author of the journal article.

The project’s name itself is a nod to the notion that data is a powerful tool for scientific discovery. It is called Project Tycho, after the 16th century Danish nobleman Tycho Brahe, whose careful, detailed astronomical observations were the foundation on which Johannes Kepler made the creative leap to devise his laws of planetary motion.

The open-access model for the project at Pittsburgh is increasingly the pattern with government data. The United States government has opened up thousands of data sets to the public.

Just how these assets will be exploited commercially is still in the experimental stage, other than a few well-known applications like using government weather data for forecasting services and insurance products.

But the potential seems to be considerable. Last month, the McKinsey Global Institute, the research arm of the consulting firm, projected that the total economic benefit to companies and consumers of open data could reach $3 trillion worldwide.