Guess what? Automated news doesn't quite work.



by Gabe Rivera Permalink Wednesday, December 3, 2008 3:38PM ETby

Any competent developer who tries to automate the selection of news headlines will inevitably discover that this approach always comes up a bit short. Automation does indeed bring a lot to the table -- humans can't possibly discover and organize news as fast as computers can. But too often the lack of real intelligence leads to really unintelligent results. Only an algorithm would feature news about Anna Nicole Smith's hospitalization after she's already been declared dead, as our automated celeb news site WeSmirch did last year:

Instantly obsolete news isn't the only hazard. A fundamental component to any news organization program is the determination of whether two stories are related. Deciding is often rather easy: if two stories hyperlink each other or both use the words Apple, Psystar, and DMCA repeatedly, they're probably related. Unfortunately, the clues are sometimes far too subtle for the most advanced algorithms to notice. This leads to bad "related" grouping, and even the failure to surface breaking news in the first place. Even giant, technically-accomplished corporations have had trouble breaking news using algorithms.

It's time for a more "edited" Techmeme

In 2005 it was clear to me that an ideal news aggregation site would need to combine automation with direct, hands on editing. In a rare departure from my usual reticence, I even stated in comments to a blog post "I'm planning extensions to my system to enable a hybrid man+memeorandum." This "planning" turned out to be rather long term, since we made no major headway on this idea until 2008.

Early on, when our system was less technically refined, the clearest path toward improvement involved simply iterating algorithmic development. Later, as the automation reached a certain degree of maturity, we recognized that direct editing could now improve news results by leaps and bounds. Though our roadmap contains a number of novel future algorithmic enhancements, introducing editing now appears to be a no-brainer.

So what exactly will change?

Humans have always edited Techmeme of course, just implicitly. For instance, when a blogger links to a story, the headline might move higher on Techmeme. What's different now is that an additional human editor will carry out changes explicitly to directly improve the mix of headlines on Techmeme. Though the implicit edits conveyed via algorithm outnumber the explicit edits perhaps by 1000 to 1 or more, the impact of the human editor is nonetheless pronounced. What will that effect be?

The news will just get faster and more interesting. Obsolete stories will be eliminated sooner while breaking stories will be expedited. Related grouping will improve. Most of this will happen only on Techmeme, though other sites (like memeorandum and WeSmirch) will increasingly benefit from the direct human touch as well.

Meet Techmeme's new scapegoat

Last month we hired Megan McCarthy to help with a variety of editorial tasks. Chief among them was taking up this new editing role. We haven't settled on a job title yet, but perhaps "news maestro" is a fitting moniker, given her new role in conducting the symphony of voices that flow through Techmeme each day. Her name may sound familiar to you: Megan has worked at institutions ranging from Wired.com to The Rose and Crown. She mentioned some other place too which I can't recall at the moment. Appropriately, Megan is quite familiar with the workings of tech news on the web.

Writers and publicists unhappy with the headlines on Techmeme are encouraged to transfer the bulk of their resentment to Megan. I'm pleased to report she's looking forward to this. Though Omer Horvitz and I will share some of these editorial tasks, Megan will focus on this much more than us.

Doesn't this make Techmeme even more unfair and biased?

If that question makes any sense to you, you're probably a frustrated blogger. Otherwise, feel free to skip to the next section! I'd like to note here that Techmeme isn't fair because life isn't fair, and Techmeme will always be biased because humans have built Techmeme. And because news judgement, by definition, is bias. For background, please see this post from last year in which I state "Techmeme is biased".

Ultimately, Techmeme will succeed based on whether it interests a significant readership. While fairness and balance probably affect this interest, I need to stress that bloggers will never agree on what's fair. Why not? To generalize and perhaps exaggerate somewhat, many bloggers feel that in the fairest scenario, Techmeme prominently features all of their posts. So it's hard to be fair.



Image by tartx

There's something happening here

I should note that the experience of introducing direct editing has been a revelation even for us, despite the fact that we planned it. Interacting directly with an automated news engine makes it clear that the human+algorithm combo can curate news far more effectively that the individual human or algorithmic parts. It really feels like the age of the news cyborg has arrived. Our goal is to apply this new capability to producing the clearest and most useful tech news overview available.

New contact info

We always want to know how we can do a better job, and are now better staffed for listening. Please send complaints or news suggestions to this new email address: editorial at site domain Though we'll realistically reply to almost nothing sent there, we'll read it all, and appreciate your thoughts!

Disqus