One of the first things you will notice when you begin work at a big, busy, bustling news organisations is process. The many layers, overlaps and intricate weaves. The printing press is an elegant machine made of nuts, bolts and levers. Its processes, boiled down, are quite simple. But they are quite rigid and each system depends on the next. In other words, change is difficult, evolution stunted.
What we have in newsrooms today are people as processes and software as systems. We work on the web, through the web and so of the web. The newsroom is now an evolving organism. In getting out a story there is no longer a clear cut path through the newsroom hierarchy. For those of you unfortunate enough never to have worked in a big news organisation I can tell you that whilst a lot of the content produced in newsrooms look similar in their output, the people and the process behind its creation are vastly different from the next.
When you work with data, code and people, there is no set order, process or system. And that is necessary to the endeavour! For any story to be uncovered from data there will always be a data cleaner. Someone needs to be assigned data janitor. In fact, data archeologist would be a more suitable (and desirable) term. Data has layers. And with the peeling away of each layer, data needs to be sifted, cleaned, sorted, analyzed and hypothesized. The process is iterative and at each iteration the researchers, journalists, programmers, editors, etc all have to work in tandem.
Most importantly, their working process needs to be closer to recursion than iteration. So what do I mean by this? In the first level of exploration the base case needs to be set i.e. the surface level of what needs cohering and cleaning, what rough outline of the data picture can we obtain, what’s the story we think it is telling and how can we best tell it. When breaking through the strata, we can’t afford to do the same thing again. So we write as many scripts as possible.
At the moment I am data janitor. That’s a good thing as it’s heavier on the journalism than it is on the code. I’m using Google Refine where I can export the data or just reuse the JSON. I’m using the query language SQL and the statistical analysis package R. With these, I can write the script and rerun it at each level of iteration. But as we dig deeper and learn more, we can ask more complex questions and paint more creative pictures. I can build up my scripts. We are not re-doing anything. We are never using the same process. Each time we are building upon the previous output.
This is the beauty of freeing yourself from the shackles of software. You only need to do something once and that allows you to loop through the process many times. In that sense, team work is most important, having all hands on deck at each level. You therefore need a multi-skilled team with a lot of crossover. Right now, I’m looking to be an entry level data archeologist, one of the first in the chain. For that you need a journalistic acumen, some faceting with Refine, knowledge of a query language, and some stats with R.
This level is really important in creating a data driven story. Making each level recursive rather than iterative could be the key to a data driven newsroom.