Anyone who’s spent time in politics knows the power of television to push messages and shape minds. But measuring its impact can require access to information that can be hard to find outside certain political circles.

The Internet Archive has launched a new project in Philadelphia that tries to address that problem with its institutional strength — gathering and archiving lots of stuff. The Archive is recording every minute of television news in Philly, as well as political ads aired on major broadcast stations. A mere 24 hours after broadcast, it will be possible to rewatch TV content online. In addition, the Archive will crawl content from across the web — news blogs, campaign websites and more — for their Philadelphia digital media landscape collection.

This effort comes in advance of several contested congressional elections in the Philadelphia region this November. Roger Macdonald, director of the Television Archive at the Internet Archive, selected the market for the first geographically based archiving project for this reason. The goal: to provide data for journalists and researchers interested in tracking the media landscape and understanding how political messages — and dollars — move through the system. Using text from closed captioning as well as metadata organized by volunteer viewers, the Philadelphia archive will be searchable by region, station, and date, as well as by campaign issue or ad sponsor.

In the past, Internet Archive data has been used a variety of research purposes, including measuring how people use gestures, mapping placename mentions in the mainstream media, analyzing sentiment, and tracking word use. Reporters at FiveThirtyEight have use the archive to shore up their reporting.

“At its heart, it’s a library,” says Macdonald. “As a library, it’s an open invitation to come and utilize our resources, collaborate with us to build up these resources for your own institutional benefit, and to elaborate on the information in the library. We’ll try to help people utilize and interact with the data. But we don’t create product. We won’t be saying: This is what you should do with this.”

So what will be done with it? Macdonald cited a paper called “Mapping the Trayvon Martin Media Controversy” by researchers at the MIT Center for Civic Media (who used Media Cloud ) as an example of the kind of research that could be done using the new tool. And indeed, some researchers are excitedly awaiting the opportunity to take a look at the Philadelphia data.

Danilo Yanich is an associate professor at the University of Delaware interested in how political ad buys influence and inform news coverage in local television (and, ultimately, policy). In his past work, he and his students have watched and coded over 30,000 hours of local television. His most extensive work thus far has been in Honolulu, where Yanich and his team recorded 100 news stories and 600 political ads during the 2012 general election. Not one of those news stories, Yanich says, addressed issues and claims made in the political advertisements. Now, with access to the Internet Archive’s data out of Philadelphia, the amount of information Yanich and his team have access to has doubled.

“The questions are: What are the issues presented in political ads? Are those issues covered in local political news stories? And if they are covered, are they addressed in a critical fashion in which there is an evaluation of that claims that are made?” says Yanich.

But because he’s an academic, Yanich’s findings won’t be published until long after election day in Pennsylvania. “One of the great challenges has always been that people look in retrospect and get great insight, but the voters miss the benefit of journalism, to help them make more informed choices,” says Macdonald. “I met with several reporters from the Inquirer a month and a half ago who expressed an enormous amount of interest. I learned from them that they thought it would be of great value in the campaign context.”

Also interested in helping the data reach newsrooms is the Sunlight Foundation, specifically Kathy Kiel

y, via the Political Ad Sleuth project. Ad Sleuth started as a crowdsourced operation that organized volunteers to visit TV stations where they would copy files that show what special interest groups are buying political ads. “Under federal law, these groups are required to file a form that indicates who their top executives are, or who their board of directors are, which is all a good political reporter needs to start figuring out who’s behind these groups,” Kiely says.

Starting today, all broadcast stations are required to file that information digitally , meaning Ad Sleuth will have a lot more information in its database. “You can enter the name of a committee in Political Ad Sleuth, and it will tell you every single place that a committee has bought ads. You can sort by state, you can sort by TV market, you can sort by date. It really helps reporters provide context, understand who’s advertising in the market,” says Kiely.

But the one thing the Ad Sleuth files don’t show is what’s actually in an ad — you know who bought it, but not what it says. By combining that data with the Philadelphia recordings, however, it will be possible to see all of that information in one place. “You’ll be able to take this soft little ad about puppy dogs and snails and kitty cat tails and connect it to the people who want to do fracking,” says Kiely. “That is the beauty of this.”

For now, though, there’s no direct digital connection between the two, and Kiely says she hopes reporters will “act as a crosswalk” between the Internet Archive and Ad Sleuth. “There are a million stories in the database that people who know things I don’t know will be able to find,” she says. “We want reporters to know about this tool and to use it.”

Non-journalist volunteers will also be needed to make this project come together. Important metadata like the political ad buy files exist as PDFs, which require a person to manually turn into searchable data. Volunteers are also needed to watch and manually tag the broadcast data, separating news segments from ads. Macdonald hopes these volunteers will help train an algorithm that can do this work automatically — ideally, such a program would also be able to differentiate between news story topics — but he says it’s unlikely it would be operational before the end of the year.

If all goes well in Philadelphia, the next step for the Internet Archive is to record and crawl media markets across the country for the 2016 election. “We want to move not just to big media markets, but some of the smaller markets — those where there’s a lot of ethnic and cultural diversity,” Macdonald says. “Those communities are overlooked in many ways, and we think that bringing our library of resources to bear on their news may help bring some of their issues to the attention of the rest of the nation.”

The Internet Archive’s project in Philadelphia will continue to expand and incorporate more community partners as the election nears — the organization recently received a $15,000 grant from the Philadelphia Foundation. It remains to be seen what kind of stories will emerge from the data gathered in Philadelphia, but the Internet Archive’s potential for social impact can only grow as their stores of information expand.