Robot Journalist: The Monkey in the Machine

According to the Infinite Monkey Theorem, a monkey hitting keys at random on a keyboard for an infinite amount of time will eventually produce any amount of narrative text imaginable, from all the content in the U.S. Library of Congress to the complete works of William Shakespeare.

These days the proverbial monkey comes in the shape of so-called narrative technology, which uses algorithms to turn random data into actual narrative text. And as dark and disturbing as narrative technology may sound, especially to those of us who make a good part of our living behind a keyboard, it is very much in use. And newsrooms are in the thick of it.

But can the writing be any good? And is there a danger that human reporters will one day be replaced by robo-newshounds?

Narrative Science, a company that uses complex artificial intelligence algorithms to extract and organize facts and insights from large amounts of data and transforms them into stories, could have the answers. Narrative Science claims the content produced by its technology is as good or better than that of your best analyst.

Narrative Science was born out of a Northwestern University project called StatsMonkey – software developed by computer science and journalism students to generate baseball recaps. (According to its website, Narrative Science will produce between 1.5 and 2 million little league recaps before the year is out, something no other publication has the bench strength to deliver on its own.)

So it should come as no surprise that the company is now in the business of generating news content under special arrangements with Forbes.com and other news outlets. Stories are created in multiple formats, including long form stories, headlines, tweets and industry reports with graphical visualizations. Multiple versions of the same story are created to customize content to the specific needs of each media outlet’s audience.

Much more on Narrative Science can be found in this excellent story by Wired Magazine.

The Washington Post, the stalwart old beacon of gumshoe investigative reporting, last year launched Truth Teller, a news application that tries to detect false claims made by politicians in speeches, television ads and interviews in close to real-time. The application is powered by an algorithm that matches claims with a database of facts. Call it fact-checking 2.0, but it’s an interesting technology rooted in the journalistic pursuit of keeping politicians honest.

Another newspaper that has been busy seeing how far it can take the technology is the Los Angeles Times. Ben Welsh, the Times’ database producer, recently told Journalism.co.uk that algorithms can be used to “ask and answer the common questions that a reporter would ask when looking at that same dataset.”

According to Welsh, the algorithms automatically write posts that look like they were written by a human. They also instantly create elements and sidebars like maps, blog posts, and headlines, and they automatically post the content into the paper’s blogging platform.

None of this may come as news to anyone already in the news business. News organizations like Thomson Reuters and Bloomberg have for years been crunching the numbers that were once the bread and butter of business reporters and analysts. And as we all know, online news sites also rely heavily on aggregated content, SEO algorithms and page ranking tools that determine the placement of stories.

The fear that technology takes the human element out of storytelling is real enough, to say nothing of the fact that the shrinking newsroom is a tell-tale sign that journalists have lots to worry about. But a good deal of media industry experts and academics agree that you can never replace humans with machines when it comes to tried-and-true journalism. After all, there are cues like nuance, tone of voice, body language and other things that only humans can detect and relate to. Stringing words together is one thing; crafting a compelling, emotionally resonant story is something else altogether.

Instead, what they envision is a world where journalists collaborate seamlessly with technology to translate and explain large sets of data that may otherwise be buried in spreadsheets, charts and – the foe of many scribes – numbers.

But content generation and fact-checking are one thing. The way we consume news is also changing. Much was made of Nick D’Aloisio, the 17-year-old Brit who recently sold his news reader app, Summly to Yahoo! for a rumoured $30 million dollars. The core technology behind Summly automatically summarizes news articles, diluting news content into bite size summaries. Just this week, Google announced it too was jumping into the news summarization business with the purchase of Seattle-based startup Wavii.

As more of the world is transformed into data, analytical tools that can capture, search, share, summarize and visualize information – and then glean meaning from all that data – are becoming indispensable.

Finally, check out Take Times Haiku, which pulls “serendipitous poetry” from The New York Times. It’s one of the most charming forms of newsroom computer-assisted content generation we found, and it proves that content summarization can also be fun – no monkeys required.