In 2000, the Library of Congress started a pilot web archiving project focused on the presidential election. After the Sept. 11, terrorist attacks in 2001, the pilot project expanded and eventually became a permanent fixture of our national archives. Five full-time staff members orchestrate an open-source web crawler called Heretix to capture the Internet’s content for future generations.

“Part of it is the election question: What do we want to archive?" says Abbie Grotke, a digital media project specialist on the Library of Congress’s web archiving team. “We can’t easily identify what is the ‘U.S. web.’ We can’t just say we want to get everything that’s ‘.com’ or ‘.gov.’ So we do have to do this selective process.”

So what does the Library of Congress think is worth saving? Here are the portions of today’s web your grandchildren will be able to access through the Library of Congress:

1. Twitter feeds—all of them







The Library of Congress announced in April that it would begin archiving Twitter feeds. Some Twitter feeds had already been archived in the past as part of special projects—for instance, some tweets regarding the nomination of Supreme Court Justice Sonia Sotomayor were included in the collection about Supreme Court changes. But now Twitter has plans to donate their entire archive of public content.

Which means your tweets, my tweets, and Britney Spears’s tweets will all become a part of the archives. What is not yet clear, is exactly how all of these tweets will be used.

“The point is not to provide a Twitter interface at the library that you can go in and use like they do on the current website,” Grotke says. “There’s talk of more of a researcher, data mining –type access to it. We’re still trying to figure out what that is exactly, but people probably won’t be able to go in and look for you specifically.”

2. National Election Candidates’ Internet Presences







The Library’s web archive started with a project that documented an election, and much of its work continues to revolve around this topic. The archive collects about 2,500 snapshots of websites during every election cycle.

“A lot of what we do, particularly with the elections, goes away rather quickly,” Grotke says. “If the candidate loses the election, their website disappears.”

The archives include presidential, congressional, and even overseas elections. The Library’s foreign operations offices document elections in those regions. Researchers of the future, for instance, will be able to see the web that surrounded the 2009 general elections in India and Indonesia.

3. Facebook Pages—A Selective Few

The web crawler often follows candidates’ or congress people’s websites to their public Facebook Pages. While Facebook has made no Twitter-like deal to donate archives to the Library, sites on the social media platform inevitably come up while documenting major events.

Thus far, the Library has left it up to the author of the Page—not Facebook—to give permission to archive relevant pages.

“The position that we’ve taken so far is that the content we’re archiving is actually owned by the site owner who put it up there,” Grotke says. “We’ve been asking permission of the original site owner.”

So unless you’re a national election candidate who has given permission, you probably don’t have to worry about your grandchild stumbling across an embarrassing Facebook photo while doing archival research for his or her college thesis.

4. Notable Historical Events







The Library has also been archiving Congressional websites since 2002. The web archive team has collected websites regarding Supreme Court changes, the Sept. 11, attacks, the 2005 papal transition, Hurricane Katrina, the Iraq war and the crisis in Darfur. A full list of current projects is available here.

5. News Sites That Give Permission

Unlike libraries in some other countries, the Library of Congress has no legal mandate to preserve the web. Therefore, the web archive team can’t collect everything they would like to without asking permission. Because news sites and blogs earn money on their content, the Library needs to get consent before it includes their pages in the archives.

Grotke says that few news organizations that the web archive team contacts for permission ever respond, which means that not much of the content in the web archives comes from news sites.

More social media resources from Mashable:

Image courtesy of iStockphoto, LawrenceSawyer