Update: See this post (with screencasts!)

Mike Caulfield’s Digital Polarization Initiative (DigiPo) is a template for a course that will lead students through exercises to analyze and fact-check news stories. The pedagogical approach Mike describes here is evolving; in parallel I’ve been evolving a toolkit to help students research and organize the raw materials of the analyses they’ll be asked to produce. Annotation is a key component of the toolkit. I’ve been working to integrate it into the fact-checking workflow in ways that complement the use of other tools.

We’re not done yet but I’m pleased with the results so far. This post is an interim report to summarize what we’ve learned so far about building an annotation-powered toolkit for fact checkers.

Here’s an example of a DigiPo claim to be investigated:

EPA Plans to Allow Unlimited Dumping of Fracking Wastewater in the Gulf of Mexico (see Occupy)

I start with no a priori knowledge of EPA rules governing release of fracking wastewater, and only a passing acquaintance with the cited source, occupy.com. So the first order of business is to marshal some evidence. Hypothesis is ideal for this purpose. It creates links that encapsulate both the URL of a page containing found evidence, and the evidence itself — that is, a quote selected in the page.

There’s a dedicated page for each DigiPo investigation. It’s a wiki, so you can manually include Hypothesis links as you create them. But fact-checking is tedious work, and students will benefit from any automation that helps them focus on the analysis.

The first step was to include Hypothesis as a widget that displays annotations matching the wiki id of the page. Here’s a standalone Hypothesis view that gathers all the evidence I’ve tagged with digipo:analysis:gulf_of_frackwater. From there it was an easy next step to tweak the wiki template so it embeds that view directly in the page:

That’s really helpful, but it still requires students to acquire and use the correct tag in order to populate the widget. We can do better than that, and I’ll show how later, but here’s the next thing that happened: the timeline.

While working through a different fact-checking exercise, I found myself arranging a subset of the tagged annotations in chronological order. Again that’s a thing you can do manually; again it’s tedious; again we can automate with a bit of tag discipline and some tooling.

If you do much online research, you’ll know that it’s often hard to find the publication date of a web page. It might or might not be encoded in the URL. It might or might not appear somewhere in the text of the page. If it does there’s no predictable location or format. You can, however, ask Google to report the date on which it first indexed a page, and that turns out to be a pretty good proxy for the publication date.

So I made another bookmarklet to encapsulate that query. If you were to activate it on one of my posts it would lead you to this page:

I wrote the post on Oct 30, Google indexed it on Oct 31, that’s close enough for our purposes.

I made another bookmarklet to capture that date and add it, as a Hypothesis annotation, to the target page.

With these tools in hand, we can expand the widget to include:

Timeline . Annotations on the target page with a googledate tag, in chronological order.

Related Annotations. Annotations on the target page with a tag matching the id of the wiki page.

You can see a Related Annotations view above, here’s a Timeline:

So far, so good, but as Mike rightly pointed out, this motley assortment of bookmarklets spelled trouble. We wouldn’t want students to have to install them, and in any case bookmarklets are increasingly unlikely to work. So I transplanted them into a Chrome extension. It presents the growing set of tools in our fact-checking toolkit as right-click options on Chrome’s context menu:

It also affords a nice way to stash your Hypothesis credentials, so the tools can save annotations on your behalf:

(The DigiPo extension is Chrome-only for now, as is the Hypothesis extension, but WebExtensions should soon enable broader coverage.)

With the bookmarklets now wrapped in an extension we returned to the problem of simplifying the use of tags corresponding to wiki investigation pages. Hypothesis tags are freeform. Ideally you’d be able to configure the tag editor to present controlled lists of tags in various contexts, but that isn’t yet a feature of Hypothesis.

We can, though, use the Digipo extension to add a controlled-tagging feature to the fact-checking toolkit. The Tag this Page tool does that:

You activate the tool from a page that has evidence related to a DigiPo investigation. It reads the DigiPo page that lists investigations, captures the wiki ids of those pages. and presents them in a picklist. When you choose the investigation to which the current page applies, the current page is annotated with the investigation’s wiki id and will then show up in the Related Annotations bucket on the investigation page.

While I was doing all this I committed an ironic faux pas on Facebook and shared this article. Crazy, right? I’m literally in the middle of building tools to help people evaluate stuff like this, and yet I share without checking. Why did I not take the few seconds required to vet the source, bipartisanreport.com?

When I made myself do that I realized that what should have taken a few seconds took longer. There’s a particular Google advanced query syntax you need in this situation. You are looking for the character string “bipartisanreport.com” but you want to exclude the majority of self-referential pages. You only want to know what other sites say about this one. The query goes like this:

bipartisanreport.com -site:bipartisanreport.com

Just knowing the recipe isn’t enough. Using it needs to be second nature and, even for me, it clearly wasn’t. So now there’s Google this Site:

Which produces this:

It’s ridiculously simple and powerful. I can see at a glance that bipartisanreport.com shows up on a couple of lists of questionable sites. What does the web think about the sites that host those lists? I can repeat Google this Site to zoom in on them.

Another tool in the kit, Save Facebook Share Count, supports the sort of analysis that Mike did in a post entitled Despite Zuckerberg’s Protests, Fake News Does Better on Facebook Than Real News. Here’s Data to Prove It.

How, for example, has this questionable claim propagated on Facebook? There’s a breadcrumb trail in the annotation layer. On Dec 26 I used Save Publication Date to assign the tag googledate:2016-08-31, and on the same day I used Save Facebook Share Count to record the number of shares reported by the Facebook API. On Dec 30 I again used Save Facebook Share Count. Now we can see that the article is past its sell-by date on Facebook and never was highly influential.

Finally there’s Summarize Quotes, which arose from an experiment of Mike’s to fact-check a single article exhaustively. Here’s the article he picked, along with the annotation layer he created:

Some of the annotations contain Hypothesis direct links to related annotations. If you open this annotation in the Politico article, for example, you can follow Hypothesis links to related annotations on pages at USA Today and Science.

These transitive annotations are potent but it gets to be a lot of clicking around. So the most experimental of the tools in the kit, Summarize Quotes, produces a page like this:

This approach doesn’t feel quite right yet, but I suspect there’s something there. Using these tools you can gather a lot of evidence pretty quickly and easily. It then needs to be summarized effectively so students can reason about the evidence and produce quality analysis. The toolkit embodies a few ways to do that summarization, I’m sure more will emerge.