Building a “quality engine” for journalism

Building a “quality engine” for journalism
Girish Gupta
Monday Note | Sept. 13, 2020 |

If you were to put a group of journalists together to discuss what makes for good quality in their field, they’d argue for hours. Yes, they’d agree on the planks: original and on-the-ground reporting, investigations based on documents rather than anonymous sources and smart analysis based on facts rather than opinion. But they’d disagree on the importance of good writing, necessary attributions and countless other nuances. As a reporter working all over the world with everyone from Reuters and the New Yorkerto the Daily Mail and Vice, I saw vast gaps in how editors and journalists saw quality. My own views on it were shaped by a non-journalistic background in math, physics and programming.

Unending discussions on the media industry’s revenue declines often fail to address that much of the industry, only in part due to declining revenue, is outputting garbage: shallow and sloppy articles designed to be nothing more than a vehicle for advertising revenue. Readers often end up with unsolicited opinion, ”banalysis” rather than analysis and certainly not first drafts of history. But it’s not all bad. Big and small news outlets areproducing some excellent work despite the industry’s deep issues. The trouble is how to find it — and encourage people towards it — at scale.

At Deepnews, we are building an algorithm to differentiate between high- and low-quality journalism based on nothing but the text of an article. That means that not only do the metrics of what makes quality journalism need to be agreed upon, they also need to be put into the precise language of code. It’s a tough task and our model will never be bullet-proof — how could an algorithm tell if a journalist made up a quote, for example — but what has astonished me in recent months is that it’s working at all.

Computing is undoubtedly capable of amazing feats, even when it comes to the complexity of human language. GPT-3, a machine learning language generator, was recently able to write a Guardian opinion piece. The piece did require some manual work, according to Guardian editors, and the end product used nice language but, ultimately, made no real sense — precisely the sort of journalism we at Deepnews want to rout out. Writing is one facet of machine learning’s ability. Another is predicting what users want; it is already being used to customize your Google search results, your YouTube and Netflix video choices and, of course, your Facebook and Twitter feeds. It works remarkably well but, of course, quality is not those companies’ top objective.

Currently, Deepnews’ primary product is a set of semi-automated newsletters on various topics, chosen by the algorithm and then a human editor who is the equivalent of the Guardian editor chopping up GPT-3’s output to publish a somewhat passable opinion piece. But, ultimately, this fails to highlight our algorithm’s good work. It is cloaked by an editorial process so just looks like any other newsletter that collates online posts. For that reason, we’re now building an interface through which users can see the results of tens of thousands of articles scored every day in real time, without editorial intervention.

But the frontend is the easy bit. What about the algorithm that chooses the stories users see? How, exactly, would this news-scoring algorithm work? What parameters would go into it? How would you train the algorithm?

There are essentially two ways to get a computer to score something like a news article. We could program it to search through articles for phrases like, “according to documents,” or, “according to anonymous sources,” and nudge an arbitrary score up or down accordingly. We could also nudge that score based on a count of the adjectives, quotes, characters, companies or countries mentioned. We could count the number of “experts” quoted and even programmatically look up their expertise and, again, nudge the score slightly. It would be a deterministic and artless form of review — but could yield somewhat useful first order results.

But our task, given the complexity of language, reporting and, of course, the many different types of journalism, is really suited to machine learning. Machine learning looks at the problem the other way around. If fed scores for a set of articles, a machine learning algorithm works backwards to determine the parameters that led to those scores — and then can score new articles using what it learned. An algorithm would learn precisely what it was that made good quality journalism given, of course, the opinions of those labeling the original articles.

In an ideal world, we would read, analyse, and score tens of thousands of news articles and feed them into the model. But this would take many years. At the other end of the scale, we could simply label all Pulitzer Prize-winning articles as good, and all Breitbart articles as bad — but this would heavily bias our algorithm along political, and other, dimensions as well as perpetuate existing, imperfect ideas about what good journalism is. (Those inside the industry are well aware that Putlizers are often more about politics than prowess.) We want readers to find good work by news outlets and journalists who don’t focus on marketing: global news agencies, small local outlets or an obscure professor who writes a blog post.

Initially, Deepnews went for a middle ground. Articles were broadly clustered by publisher (which goes against the idealistic arguments above) and then journalism students were trained to analyse articles using metrics the team broadly agreed upon. They were asked to prioritise original, in-depth and well-reported articles. That proved to be a good enough start and, in the end, we produced a training set of tens of thousands of articles.

Once we had some initial training data, we fed it into a mathematical/programmatic abstraction known as a neural network, designed to emulate the human brain by passing data through millions of neurons, each tuning themselves to an element of the text and working out how relevant it was to the final score. Our initial model came up with fairly solid results, good on some genres and bad on others, but certainly better than random.

Fine-tuning both the training data and the architecture of the neural network is where the magic happens. We still need to work out a better training set and then, once we have that, think about the type of network — straightforward, convolutional, recurrent or a combination of these and other methods? That is what we are toying with as we evolve the model, and what I’ll be writing about in forthcoming posts.

Ultimately, all the news you read is chosen by an algorithm, be it a fuzzy one inside an editor’s brain if you pick up a certain newspaper, or a more mathematical one whose objectives may or may not align with yours. Using our new beta interface to compare our raw and unedited output to that of social media outlets and even news aggregators shows a startling difference: journalism surfaced by Deepnews is journalism I actually want to read!

We’ll be slowly opening up the interface to users in the coming weeks so you can decide for yourself.