Data, Journalism and the Problem of Narrativity

The following is a written representation of a presentation I gave at SxSW on a panel entitled “Maps of Time: Data as Narrative” (click on the link to listen to the panel) and at POLIS Journalism Conference: Reporting The World #polis12 on a panel entitled “Disruptive Data: How can we use data journalism to investigate more deeply and reveal information that the authorities want kept secret?”. Apologies for the lateness of this post and the lack of posting in general; I contracted what appeared to be a potent mixture of ebola, SARS and H1N1 whilst at SxSW. I would like to thank the panel organisers and participants at SxSW (Alex Graul, Jenn Thom, Burt Herman and Drew Harry) and POLIS (Charlie Beckett, Iain Overton and Kristinn Hrafnsson).

We are driven to focus on what makes sense to us, not what is truth

Let me start this post with a story. A story about Steve:

Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and timid soul, he has a need for order and structure, and a passion for detail.

From what you know about Steve, is he more likely to be a librarian or a farmer? Now, when you retrieve information you work along the narrative line and most of you will come to the conclusion that Steve would be happiest as a librarian. It makes sense from our understanding of Steve and we’ve built this sense from our understanding of how people are. Now ask a computer the same question. It would look for the data and retrieve the statistical facts. Those being that there are more than 20 male farmers for each male librarian in the United States

The fact is we don’t know which is more correct, the mind or the machine. One is more sensible the other more likely. Either way, we don’t have enough information. The machine’s calculations will change depending on time, place and economic context. The narrative sense could change if we got the answers to when, where, why.

Because we now live in an information rich digital age, we no longer start by asking questions of our data. First, we need to decide on the narrative we want to tell and let that guide us along our path of enquiry. In this example, are we looking to get a bird’s eye view of the state of the jobs market or a worm’s eye view of Steve’s journey during tough economic times? In that sense, do we need a journalist or a team of programmers working on a smarter algorithm? What is most powerful in the Age of Big Data?

We store narratives, we don’t crunch numbers. Computers cannot store narratives, they need raw data. It is impossible for our brain to see anything in raw form. We understand narrativity not probability. Computers understand probability. Humans always start with narratives, computers with data.  So in the Age of Digital Journalism we will always have the antagonism between narrative context and informational context.

The Problem of Narrativity

To understand the problem that has arisen from storing, retrieving and narrativising information from a digital platform let us take a simple example. Please read the following:

Big romances a valley near a sheep. Big hopes the downright across the broad deed. The published religion views big. The contained container produces data. The transformed worker nests. An assistance coughs? The subsidiary shot suggests the gentle deed. Each all stamp orbits. Knowledge waits above narrative. Every conductor laughs into an ago gene.

 

This is a computer generated paragraph based on the input words ‘big’, ‘data’, ‘narrative’ and ‘knowledge’. As you can see it make little sense. In trying to make sense of it, what happened inside your head? Now read this:

The more orderly, less random, patterned, and narratized a series of words, the easier it is to store that series in one’s mind. The more orderly, less random, patterned, and narratized a series of words, the easier it is to store that series in one’s mind. The more orderly, less random, patterned, and narratized a series of words, the easier it is to store that series in one’s mind.

Information is costly to manipulate and retrieve. By finding the pattern, the logic of the series, you no longer need to memorize it all. You just store the pattern. And, as we can see here, a pattern is obviously more compact than raw information. We have a hunger for rules because we need to reduce the dimension of matters so they can get into our heads. A novel, a story, a myth, or a tale, all have the same function: they spare us from the complexity of the world. They help build in our mind an idea. And that’s what true narratives do. They don’t just paint pictures they build structures in our mind upon which logic is built.

Data is just bytes of information that can be used to form a novel, a story, a myth or a tale. Data is raw, it is complex. Our minds store the structure the narrative has built because we find it so costly to store and retrieve data. Machines, however, find the ideation part costly. But they find it easier to store and retrieve data. It’s what they were built to do.

To make this point even clearer, look at the following two sentences: “The king died and the queen died” versus “The king died and then the queen died of grief”. A computer finds the first one less costly to store as it contains the least number of characters. We find the last one easiest as the linear nature of causality builds a narrative from time. The ‘then’ provides consequence and the ‘grief’ reason, thus forming a more solid narrative.

Data and The Problem of Narrativity

Here is a very poor visualisation of Formula 1 deaths over time made from data on Wikipedia:

Formula One Fatal Accidents

It’s a timeline but it tells you very little. It visualises the data but does not encapsulate the sport. It tells you nothing of the spills and thrills. Of the changing rules and regulations. Of the evolution of mechanics. Clearly it doesn’t tell a story. What do you store in your mind when you look at it? It’s like the computer generated paragraph. For anyone who wants to see the narrative just watch Senna. In fact, Ayrton Senna is the last casualty on the chart. The second last is Roland Ratzenberger. He died the day before. There is no documentary on his life. In fact, very few people have even heard of him.

And just as Farrah Fawcett was to Michael Jackson, Mother Teresa to Princess Diana, so narrativity is fixed in time and place. People believe data is more objective than narrative. In the visual, every casualty is equal to the next. They all have the same space. But still, no one will remember Roland Ratzenberger from this. The problem lies not in the nature of events, but in the way we perceive them. Especially their importance and the space they occupy in the social psyche. So when working with data, as a journalist, your choice is the narrative as the victim or the data as the victim. One contends with the other and finding the right equilibrium is what separates the good datajournalism from the bad.

Big Data and The Problem of Narrativity

Watch this Guardian Interactive of Rupert Murdoch: How Twitter tracked the MPs’ questions – and the pie.

This used Big Data in the form of tweets. The problem with Big Data and narrativity is that every observation is not treated equally. Now that measurement is in some ways visible during observation, this observation has an effect. Social data allows us to crowd source sense-making and narratives from digital input within a very short period of time.  Much shorter than we have ever been used to. In other words, one single observation can disproportionately impact the aggregate, or the total. In this interactive we can see how one event, because of its unique and unexpected nature, causes a narrative shift. The story immediately goes from tragedy to comedy. Murdoch goes from villain to victim. We can see to what extent the volume of tweets referred to the pie incident whereas if a journalist covered the story she would not have dedicated an equal portion of word count to such an event.

This phenomenon arises because social quantities are informational, not physical: you cannot touch them. They don’t belong to a universe of rules but one based on probabilities. They play dice. The problem of narrativity in the age of Big Data can be summarised thus: While narrativity comes from an ingrained biological need to reduce dimensionality, robots would be prone to the same process of reduction. Information wants to be reduced. But the processes of reduction, for the machine and for the mind, are fundamentally different. To analyze, first we must reduce. Reduction is a form of translation, it requires a medium (mind or machine). But the message is the medium.

So what happens when you have a network of minds feeding a network of machines all simultaneously in the process of information reduction, translation, medium, message?

The Equation

Before big data one storyteller had to put together narrative and causality in order to make sense. And it scaled. The equation was thus:

Narrative + Causality = Sense?

Now we have networks forming a collective mind to sift through information in real time and build these narrative-causal relationships. A machine will not make sense out of the terabytes of data but people manage their connections, their networks, their followings in order to architect an infrastructure which builds narrative sense (read the two extract in The Problem of Narrativity section above). To make a narrative out of big data and its networks we need to integrate it as it forms over time, but we have the bigger responsibility of finding truth. Remember Steve? We are driven to focus on what makes sense to us, not what is truth. The equation, the problem we now have to solve, becomes:

 

Either side of this equation now has very different weighting when it comes to the reduce, translate, mediate cycle. The left hand side now requires a machine and the right hand a mind. It requires a programmer and a storyteller to decide whether that equality holds true.

The Problem

  • A thousand tweets cannot prove you right, but one tweet can prove you wrong – understanding Big Data requires the reduction process of a machine whereas figuring out which tweet proves the narrative wrong requires the reduction/deduction of the mind
  • We see only the events, never the rules, but we need to guess how it works
  • Like causality, narrativity has a chronological dimension and leads to the perception of the flow of time. Causality makes time flow in a single direction, and so does narrativity
  • Our tendency to perceive – to impose narrativity and causality – are symptoms and the same disease: dimension reduction
  • Big data allows us to be wrong with infinite precision

 

The Solution

  • You need a story to displace a story. Stories are far more potent than ideas. If you live within a narrative discipline, which societies do, your best tool is a narrative. Ideas come and go, stories stay
  • We can use our ability to convince with a story that conveys the right message – what storytellers seem to do
  • We need to create narrative paths along which people can follow, react and pass on – I’ve used several different narrative forms in this piece: a story, an exercise, a visual, an interactive and an equation. Whichever resonates most with you, that is your narrative.
  • The meta-narrative is the narrative

 

In Summary

The problem of narrativity does not lie with the journalist, but with the public. If we have control of information networks we cannot hold journalists or computers accountable. We want to be told stories, and there is nothing wrong with that – except that we should check more thoroughly whether the story provides consequential distortions of reality.

It’s not about the creation of the narrative but the birth of the meta-narrative

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*