New Research Shows How Copyright Law Is Keeping Useful Info Off Wikipedia

from the too-bad dept

But Nagaraj found was that the availability of public domain material dramatically improved the article's images. Before the digitization, players from between '44 and '64 had an average of .183 pictures on their articles. The '64 to '84 group had about .158 pictures. But after digitization, those numbers dramatically changed: there were 1.15 pictures on each of the older group's articles -- but only .667 in the new group. More recent players, covered by privately-owned parts of Baseball Digest, had half as many images on their pages as did old-timers.

And the effects of this -- of just having an image on the page -- cascaded to other metrics. "Out-of-copyright" players's pages saw a significant boost in traffic. Articles from the pre-'64 that were already in the top 10 percent saw their hits increase more than 70 percent. Articles from that group in the least-popular ten percent saw traffic to their articles increase by 25 percent. Those pages were more frequently edited across the board, too. And this makes sense: Google rewards updated content, and it rewards images. The out-of-copyright players provided more of both.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community. Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis. While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

The Atlantic has an interesting article about some forthcoming research from MIT PhD student Abhishek Nagaraj (though, oddly, the article never introduces him, never mentions his first name, and just refers to him throughout by his last name only). It's the latest in an increasingly long line of evidence showing how copyright is stifling content and keeping it from reaching the public in useful ways. Nagaraj found a particularly useful natural experiment in the archives of"the oldest and longest-running journal of matters baseball-related," which has been published continuously since 1942. For various reasons (sounds like they didn't renew...) the issues from 1942 until 1964 are in the public domain. Everything after that... not so much. Google's book scanning project scanned nearly every issue from July 1945 until 2008.Nagaraj realized that Wikipedians were using this as good source material for Wikipedia pages -- especially on the profiles of older baseball players. He noted that there was little stopping the text from being rewritten, but the real issue was around images. People could use the scanned images to illustrate the profiles, but clearly they could only use the public domain ones without permission.And, yes, the article notes that he put in place various controls to correct for unrelated differences. Basically, the only observable difference in why the pages have more images is the public domain status of some of those works vs. others. Some might argue that this is no big deal, but he found a second bit of useful data s well:I'm reminded, yet again, of that chart of the now infamous gap in books under copyright that you can't find any more -- even though older books in the public domain are widely available. Once again, we're seeing not only the massive value of the public domain, but how much useful content is being locked away by excessively strict (and excessively long) copyright law.

Filed Under: baseball digest, copyright, information, learning, public domain, wikipedia