Sometimes you write a piece of software and it gets used for purposes you didn’t quite imagine at the time. Sometimes you write a piece of software and it unexpectedly rearranges your life. I’d like to tell you a quick story about a Twitter bot named @CongressEdits. It tweets when someone edits Wikipedia anonymously from the United States Congress. In this post I’ll give you some background on how the bot came to be, what it has been used for so far, and how it works. @CongressEdits taught me how the world of archives intersects with the world of politics and journalism. To explain how that happened, I first need to give a bit of background.

The funny thing about @CongressEdits is that it wasn’t my idea at all. Back in July of 2014 I happened to see this tweet go by in my stream:

This Twitter bot will show whenever someone edits Wikipedia from within the British Parliament. It was set up by @tomscott using @ifttt. — Parliament WikiEdits (@parliamentedits) July 8, 2014

Tom Scott’s insight was that there is a Wikipedia page for every IP address that has edited Wikipedia, and it could easily be plugged into Twitter. For example, you can see what edits 194.60.38.198 has made here. This page is also available as an Atom feed so it can be used by a feed reader or other software like IfThisThenThat (IFTTT). IFTTT lets you easily funnel data from one service (Facebook, Flickr, Instagram, Twitter, Gmail, etc) to another. Tom created an IFTTT recipe that watched the IP address feed for two IP addresses he knew were proxy servers for the UK Parliament, and tweeted them using the @parliamentedits account if any new edits were found. How did he know the IP addresses? Well from a FOIA request naturally.

Wikipedia According to Alexa, wikipedia.org is the sixth most popular destination on the Web. Wikipedia is, of course, the encyclopedia anyone can edit, so long as you can stomach wikitext and revert wars. Wikipedia is also a platform for citizen journalism, where events are documented as they happen. For example, the article about Germanwings Flight 9525 that crashed on March 24, 2015 was edited 2,006 times by 313 authors in 3 days. What is perhaps less commonly known is that Wikipedia is a haven for a vast ecosystem of bots. These bots perform all sorts of maintenance tasks: anti-vandalism, spell checking, category assignment, as well as helping editors with repetitive operations. Some estimate that as much as half of the edits to Wikipedia are made by bots. There’s a policy for getting your bot approved to do edits, and software libraries for making it easier. Wikipedia bots are themselves the subject of study by researchers like Stuart Geiger, since in many ways this is terra incognita for information systems. What does it mean for humans and automated agents to interact in this way? What does it mean to think of Wikipedia bots in the context of computational journalism? Does it even make sense? While these questions are certainly of interest, to understand the story of @CongressEdits you really only need to know two things about Wikipedia: Wikipedia keeps a version history of all the edits to a particular article. Wikipedia allows you to edit without logging in. Typically editors log in, and any edits they make are associated with their user account. But to lower the barrier for making contributions you can also edit articles without logging in, so called anonymous or (more precisely) unregistered editing. When you edit this way there is no user account to tie the edit to, so Wikipedia ties the edit to the IP address of the computer that performed it. If you go to Google and ask “what is my IP address” you should see a box at the top with our IP address in it. This is the IP address that Google thinks you are at. Given the way networks are set up at places of work, hotels, etc it’s possible that this IP address identifies a proxy server that filters content for many people on your network. So the IP address seen by Wikipedia may be for your organization, not your specific workstation. Often spammers and other vandals will edit without logging in. Wikipedia uses these IP addresses to identify pages that have been vandalized, and will sometimes temporarily block edits from that IP address. It’s ironic that unregistered edits are often referred to as “anonymous,” since the IP address says a great deal about where the user is editing Wikipedia from. They add a physical dimension to the internet that we tend to think of as a disembodied space.

CongressEdits So back in July of 2014, I saw Tom’s tweet and thought it could be interesting to try to do the same thing for the US Congress. But I didn’t know what the IP addresses were. After a quick search I found a Wikipedia article about edits to Wikipedia from the US Congress. A group of Wikipedians had been tracking edits from Congress already, but in a more manual way. I tweeted the IP addresses from the article to some experienced civic hackers I followed on Twitter, to see if they could verify them: @joshdata @derekwillis @konklone do you happen to know of any more ip ranges for house & senate office buildings? https://t.co/gAyKOdj1eK — Ed Summers (@edsu) July 9, 2014 Joshua Tauberer responded with a pointer to the GovTrack source code on GitHub, where he had a similar set of ranges. GovTrack is a government transparency site that aggregates information from government websites to provide easy access to the US legislative record. The good news is that Josh’s list matched the ranges in Wikipedia, and added a few more. The bad news was that the ranges included hundreds of thousands of individual IP addresses. I didn’t know what the proxy servers were in those ranges, or even if there were proxy servers at all—it just wasn’t immediately feasible to watch hundreds of thousands of Atom feeds. Fortunately, I had previously worked on a very simple application, Wikistream, that visualizes the current edits to Wikipedias in all all major languages. To do this I needed to tap into the edit stream for all the langauge-specific Wikipedias, which sounds difficult, but in fact is quite easy. I learned a few years earlier that the MediaWiki instance behind each language-specific Wikipedia logs into a Internet-Relay-Chat chatroom, and announces all edits there. It is used by some of the previously mentioned anti-spam, anti-vandalism bots to keep abreast of what is changing on Wikipedia. Wikichanges is a program that simply logs into those IRC channels and displays the edits as a stream on a web page. While creating Wikistream I also created a little Node library called wikichanges that bundles up the channel watching and parsing code for reuse. Here’s an example of a short Node program that uses the wikichanges library to print out the title of each change to all Wikipedias as they happen: var wikichanges = require('wikichanges'); var changes = wikichanges.WikiChanges(); changes.listen(function(change) { console.log(change.page) }) Wikimedia Foundation now also host their own stream service which provides WebSocket, XHR, and JSONP polling interfaces to the stream of edits as they happen. What this means is you can write some static HTML and JavaScript that connects to the stream without having to bother with the IRC chatrooms or running a server of any kind. Here’s an example of a static HTML page that will display a list of edits in the English Wikipedia: <!doctype html> <html> <head> <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/0.9.1/socket.io.js"></script> <script src="https://code.jquery.com/jquery-1.11.2.min.js"></script> <script> var socket = io.connect("http://stream.wikimedia.org/rc"); socket.on("connect", function() { socket.emit("subscribe", "en.wikipedia.org"); }); socket.on("change", function(change) { $("ul").prepend("<li>" + change.title + "</li>"); }); </script> </head> <body> <h1>English Wikipedia Edits !!!</h1> <ul></ul> </body> </html> You can copy and paste this into a text file and open it with your browser. In the change callback, each change object that is passed to that function has quite a bit of additional information about the edit. For example, here’s the JSON for an edit to the 2015 military intervention in Yemen English Wikipedia article: { "bot": false, "comment": "", "id": 727311673, "length": { "new": 64728, "old": 64728 }, "minor": false, "namespace": 0, "revision": { "new": 655651590, "old": 655651542 }, "server_name": "en.wikipedia.org", "server_script_path": "/w", "server_url": "http://en.wikipedia.org", "timestamp": 1428568783, "title": "2015 military intervention in Yemen", "type": "edit", "user": "80.184.65.164", "wiki": "enwiki" } From this information it’s possible to construct a url for the diff or to talk back to either the MediaWiki API or Wikimedia’s shiny and new REST API for more information about the article that changed. When I realized that there were hundreds of thousands of IP addresses to monitor for the US Congress, it occurred to me that it would be pretty easy to watch the changes as they come in, and see if an IP matched a range, rather than needing to poll hundreds of thousands of Atom feeds. After a couple hours’ work, I had a short program that tweeted edits that came from the US Congress. I put the code on GitHub and thought a handful of my friends would follow it. Little did I know…

anon We’ve all heard about the promise of open-source software. I’m a believer, even though it has been rare for somemthing I’ve put on GitHub to get more than an occasional pull request or bug fix. The initial code for CongressEdits was 37 lines of CoffeeScript. Once this was up on GitHub I quickly got requests to make it configurable for other Twitter accounts, to customize the text of the tweet, provide IPv6 support, and (of course) to allow it to listen to other IP address ranges. Since it was such a small program it was easy to accommodate these requests. I renamed the project to anon, since it was for more than CongressEdits, and then things got interesting. A merry band of sixty or so Twitter bots administered by almost as many people sprouted up, such as: @gccaedits: Government of Canada

@ItaGovEdits: Italian Parliament

@rugovedits: Russian government

@euroedit: European Parliament

@bundesedits: German Bundestag, federal ministries, and other federal agencies

@valleyedits: Google, Facebook, Apple, Twitter, and Wikimedia

@natoedits: North Atlantic Treaty Organization Jari Bakken, a civic hacker in Norway, quickly put together a historical view of the edits for these bots using Google BigQuery and Wikipedia dumps. The Gitter chatroom for anon proved to be a great way to communicate with other people who were interested in running the bot or contributing to the project. A few days after I put anon on GitHub, Tom Scott wrote to me saying that @parliamentedits hadn’t tweeted any changes yet, and he suspected that the two proxy servers in his IFTTT recipe were no longer being used by Parliament. He and Jonty Wareing were able to determine that, as with the US Congress, there was a large range of addresses that needed to be monitored. Jonty started up his own anon bot to monitor these Parliament IP ranges, and the original IFTTT recipe was retired. Tom Scott sent a second FOIA request to obtain the IP ranges, but this time it was denied. I was shocked at how rapidly these bots popped up. I was equally surprised by how many people followed @CongressEdits: in 48 hours it jumped from 0 to 3,000 followers, and rapidly grew another order of magnitude to 30,000 followers.

The Apparatus An important thing to note in these stories is that the bots let us know the edits came from a particular place (VGTRK, US Congress, NYPD), but without further traditional investigative journalism, we don’t really know who made the edits, or what their motivations were. Once @CongressEdits acquired 30,000 followers and individuals inside Congress became aware of its existence, some of the edits seem to be have been made knowing that they would broadcast—the observer effect kicked in. While it’s difficult (perhaps impossible) to spoof an IP address associated with a Wikipedia edit, it is possible that someone could go through the effort if the political stakes were high enough. Unsurprisingly, this technology is not transparency panacea. The same political landscape is replicated and implicated in these bots. They can be manipulated by actors for a variety of reasons once it’s clear how they operate. I hope this article helped to lay bare the apparatus behind CongressEdits and other anon-style bots. Now that you know how simple it is to access the stream of edits on Wikipedia, I hope you have ideas for similar bots that could perform services of social or perhaps artistic value. Soon after I created CongressEdits, my friend Dan Whaley suggested it would be interesting to be able to observe all edits to articles related to US Congress, and so @congresseditors came into existence. The volume knob on @congresseditors is set quite high since there are so many edits (especially after an election)—but it can be an interesting stream to dip into. Or consider Hatnote’s Listen to Wikipedia project that taps into the Wikipedia edit stream to do just that: listen to Wikipedia. The mundane details of the edit stream can be reimagined, repurposed, and transformed. I hope that this article has sparked some ideas of your own. (If you do put together a bot I encourage you to put the code up on GitHub for others to see. You never know what might happen.)