Big Data Learns To Write

Automated writing platforms cull streaming data from multiple sources and churn out thousands of articles per second.



Apple WWDC 2014: 9 Things To Expect (Click image for larger view and slideshow.)

Computers can write, and surprisingly well, too. Software from startups like Automated Insights and Narrative Science generate written reports in plain English for targeted markets, including fantasy football, real estate, personal fitness, journalism, and essentially any storytelling niche where algorithms can quickly transform real-time data into readable text.

"When we're producing narratives around a particular data set, we'll produce thousands per second," said Adam Smith, Automated Insights' VP of sales and marketing, in a phone interview with InformationWeek. "We published over 300 million stories for our clients last year, and we'll publish well over a billion this year. We're tailoring a story in a personalized way to an individual user, or about an individual topic."

Automated Insights' clients include Yahoo Fantasy Sports, which uses the company's Wordsmith platform to generate personalized reports for its users, a process too time- and labor-intensive for human writers.

With the Yahoo Fantasy Football recaps, for instance, "we're doing probably 1,500 to 2,000 [stories] per second, and millions over a one- or two-hour period," Smith told us.

[It's not just journalists who are nervous about robots: Wearables, Drones Scare Americans.]

Insights from big data streams are often presented in dashboard form, complete with charts, graphs, and other visually oriented infographics -- an approach that requires end-users to "interpret" the data, says Smith.

But with automated written reports, "all you have to do is read. It's like sitting down with a data scientist and having them walk you through the key aspects."

Of course, this algorithmic approach to writing works best with data-driven topics.

"We do a lot of work with big data, BI, and analytics. And part of that is, how can we mine data in real time, make it actionable, spot the insights, pull together the insights that are most important, and tell a story about it?"

Software-generated copy is well suited to formulaic topics, too, such as summarizing a baseball game or other sporting event.

"A software platform like ours can look back to the 1800s and analyze every single performance that's ever happened," says Smith. And while a few human sportswriters may possess a near-encyclopedic knowledge of historical baseball stats and scores, none can match the automated system's prodigious output.

How does the robot writer "watch" a game? Ingest the data it needs to construct a story? Joe Procopio, Automated Insights' VP of product engineering, explained in a recent blog post:

At the professional sports level (think MLB, NFL, NBA), data is collected not just at the game and player level, but at the play and performance level. We now know how fast each pitch is thrown and where, how many times and in which direction a quarterback goes long, and even whether or not a game-deciding call was blown, thanks to replay.

In all pro sports and even most college and some recreational, there are now all kinds of sensors and cameras tracking the game, sometimes at the individual level, all of which can support qualitative analysis on quantitative facts. For example, when we tell you a hitter is off his swing, we're not playing a hunch, we can see it.

Robot writers are penning more than sports recaps, too. Quakebot, an algorithm created by Los Angeles Times journalist and programmer Ken Schwencke, garnered plenty of attention in March when it developed, wrote, and published a story about a Southern California earthquake in less than three minutes.

No ink-stained wretch could do that.

So should flesh-and-blood journalists be worried? Will the algorithm put them out of work?

On the contrary, Automated Insights claims. By doing data-churning grunt work, robot writers free human journalists to interview people, provide deeper insights, and essentially tell stories that algorithms can't.

Well, not yet, anyway.

When it comes to managing data, don't look at backup and archiving systems as burdens and cost centers. A well designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems. Read our The Agile Archive report today. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.