Firstly I’m going nowhere as I am having operations on my feet. The Internet, however, is my only freedom. Painkillers are my cage.
…how many data journalists are working today? How many will be needed? What are the primary tools they rely upon now? What will they need in 2013? Who are the leaders or primary drivers in the area? What are the most notable projects? What organizations are embracing data journalism, and why?
I spent last year as a Knight-Mozilla OpenNews Fellow at The Guardian, working within the Interactive Team, in the hope of getting where a journalist in the age of big data needs to go. It was not just about learning to code, which I started a year pervious at ScraperWiki, but understanding the digital story cycle, the editorial decisions, and the internal politics. Most importantly, my journey was and still is the road less travelled by. I had to completely retrain myself. With no syllabus, no certificate and no notion of what would be needed to don the robes of the data journalist. So here is my two cents.
How many data journalists are working today?
That depends on your definition of a data journalist. After the dawning of the era of the citizen journalist we no longer have a clear definition of journalist! This is a moot point but I would say it is more of a legal point. Rather than egotistically heralding the bastions of truthfulness, impartiality and objectivity; I would say a journalist is someone who can argue a case for public interest defence in court. A data journalist is therefore anyone who can argue a case for public interest if he/she is brought to court upon producing content for the web.
Typically a journalist could argue a case for public interest upon publication and these were traditionally of the form of print, radio and TV. And traditionally , the process toward publication began with collecting documents and documenting the process which transformed these documents into its resultant mediated format. Now, documents are data. We live in a digitized world. Therefore all journalists are becoming data journalists. It’s just journalism adapting to new media going out and new media coming in.
To answer the question (whilst not really answering), the number of data journalists today are the number of journalists working in an organization which will survive the digital transition and find a stable business model, and who will not be fired before these things happen. The number of data driven investigative journalists will be one or two per survivor successful enough to afford specialist desks. I make this distinction because all documents are data and all journalism is built atop documents, however journalism in its simplest form involves mediating already processed data whereas data specialists will be processing and analyzing raw data in order to get to the stories within.
How many will be needed?
That depends on who survives the digital cull. Journalists will need produce for the web first. Once news organizations settle on a content management system I believe there would be very little computer skills needed. Training to use the CMS would be more than adequate. I believe that the big stories, the special features and editorials are going to need that extra pizzaz. As such, small investigations coupled with video, graphic and/or interactive features will become desirable, especially amongst those editors who want to boast at conferences.
To create a long form digital piece you of course need a multi-skilled team. For these teams to work efficiently requires each member to be familiar with all the processes and to have a vast range of skills. So a most specialized data journalist with coding skills will be desirable but only at a select few organizations. But I believe supply will be very low as few journalists are interested in this route and even few institutions teach computer-assisted-reporting to an adequate standard for the job requirements.
What are the primary tools they rely upon now?
Currently there is no standard and tools vary according to institutions. The very basics are spreadsheets and fusions tables. Web standards are HTML and CSS. The more data intensive (and ambitious) interactive desks work with R. This is a legacy of training as statisticians and data scientists coming out of University should be R trained. Also R is free. Assume everything they use will need to be free. One object oriented programming language, Ruby or Python, is used. Some say either is fine but others such as ProPublica state they are Ruby houses. If a team dictates which one will be used as standard, this is most probably a good sign. It means they are sharing code and plan on reusing it.
Who are the leaders or primary drivers in the area? What are the most notable projects? What organizations are embracing data journalism, and why?
These are all sort of the same question. The primary drivers are the tje organizations embracing data journalism and they are creating the most notable projects. The Guardian, The New York Times, The Boston Globe, Texas Tribune, BBC, ProPublica, etc. The majority of the big players in the US are in the game and playing mostly to swap and/or steal people between them rather then fostering their own personnel or attempting a different approach. The UK has a couple of major players and some crouching at the starting line. The US has a much longer history of CAR and a more go-get-’em attitude that has allowed Aron Pilhofer and Scott Klein to adopt their own team management and structure.
Even though “data journalism” is on the rise (as a term anyway) getting to where Aron and Scott are is extremely difficult, being successful at it even more so. Aron and Scott were given the opportunity to form a team when news organizations were going through a frantic burst of evolution, straining to cling on to life. ProPublica is one fo the more elegant species to arise from this turbulent time. Now, those who feel they survived the crunch are battening down the hatches and hoping to stave off the winter by being conservative. Because managers and directors have now heard the terms “big data”, “interactives”, “data journalism” they feel someone has figured it out and all they need to do is steal them or if that fails find a cheaper copy.
Sadly, most news organizations are embracing data journalism because The New York Times has. Managers feel they should be sprouting buzz words at meetings and having coffee with Google reps. They need to be seen to be embracing the future. Even if they have no idea what that entails. Many send editors to conferences but few send journalists to training courses. Most want to find out what free tools they can use, who they can partner with and who is the next Twitter. At the core of digital journalism development is incentive. It still doesn’t make money but it can make careers. It has to stop being about the newsroom ‘ooooh’ factor and more about getting the right story.
What will they need in 2013?
Experience. They will need people attempting these projects on a small scale, therefore they need non-developers to have some skills and a thirst to learn new ones. They need more people and more models in order to experiment as “data journalism” is not a solid thing as yet. They need the internal structure (and a stable one) in order to allow the team the time it takes to mature. They need the to money to pay developers, good ones. They need the internal structure of their institution to undertand what it is they are doing and to adopt the same fluidity.
They need to keep experimenting.