About NewsDiffs

Why NewsDiffs Exists

In the age of rapid reporting and digital news, there is rarely a single "final" version of an article.

NewsDiffs watches different versions of highly-placed articles on online news sites, starting with nytimes.com.

For better or worse, readers can now view "the making of the sausage" that historically was discreetly tucked away from view with dead-tree editions. Some of those changes provoke criticism.

NewsDiffs was born of the Knight Mozilla MIT hackathon on June 17, 2012.

What Types of Changes?

Updates to articles on major news web sites happen all the time.

Often the changes to articles are simply minor, small edits that are tightening up (such as "most" to "many.")

Sometimes the changes are the insertion or the deletion of a section.

Sometimes the story changes as a result of a rapidly breaking news, such as the death of Rodney King. The story grows and deepens over time as more information comes in. To the right is an example of a story that evolved about the health of former Egyptian president Hosni Mubarak, first when it was reported that he was "clinically dead" and then later that he had suffered a stroke.

Another interesting example would have been the killing of Osama Bin Laden on May 1, 2011, which broke at 10:40 p.m. with a sparse report from Helene Cooper:

WASHINGTON — Osama bin Laden has been killed, a United States official said. President Obama is expected to make an announcement on Sunday night, almost 10 years after the Sept. 11 attacks on the World Trade Center and the Pentagon.

Also interesting are the language changes that reflect subtle differences in connotation. For example, whether an election was "democratic" vs. "competitive."

In some cases, we can see how a story can substantially change as more reporting comes in, such as in a story that helped inspired this project: the article about the arrests of Occupy Wall Street protestors on October 1, 2011. Two versions, twenty minutes apart, had substantially different first paragraphs about the arrests of Occupy Wall Street protestors on the Brooklyn Bridge in October 2011. The criticism it received was perhaps unfair, but it's hard to determine since the earlier version is no longer publicly available.

Why the name NewsDiffs?

diff is a popular tool in computer programming that outputs the differences between two files. It is typically used to show the changes between one version of a file and a former version of the same file. This idea of version control is well known within software engineering, and should be used in journalism as journalism moves toward constantly evolving versions of news stories. We have had many sessions at many newsy foo and bar camps on "Github for news." Well, this time, we literally put the news into git.

How NewsDiffs Works

NewsDiffs regularly looks at the stories that are linked to (or have been linked to) from the homepage of major online news publications, starting with nytimes.com and cnn.com. It parses them and stores them in a git repository.

The records start June 17, 2012.

Not all articles are stored. Only those with changes are displayed. NewsDiffs focuses mostly on ones that are linked from the homepage.

What Tools Did You Use?

The NewsDiffs source code is available on Github.

The front end used to view the differences is from the open-source Diff Match Patch library.

The website is built on Django.

The prettiness is courtesy of Twitter Bootstrap, which has been saving developers from themselves the world over.

Who created NewsDiffs?

NewsDiffs is the product of a weekend of work from the Knight Mozilla MIT hackathon, by Eric Price, Jennifer 8. Lee and Greg Price.

Greg, who works at Tddium, has his masters in theoretical computer science from MIT and a bachelors in mathematics from Harvard. (He also led the YouTomb project, which tracked videos removed from YouTube). Eric is currently in his fourth year of a PhD in theoretical computer science from MIT. Jenny was a reporter at The New York Times for nine years, wonders what it's like to be a product manager and has been tortured by missing semicolons.