<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Miner UK</title>
	<atom:link href="http://datamineruk.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://datamineruk.com</link>
	<description>Finding stories in data. The pursuit of facts in plain sight.</description>
	<lastBuildDate>Tue, 21 May 2013 12:27:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Preaching DDJ In The Netherlands</title>
		<link>http://datamineruk.com/2013/05/21/preaching-ddj-in-the-netherlands/</link>
		<comments>http://datamineruk.com/2013/05/21/preaching-ddj-in-the-netherlands/#comments</comments>
		<pubDate>Tue, 21 May 2013 12:27:11 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[Talks]]></category>
		<category><![CDATA[data driven journalism]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[ddj]]></category>
		<category><![CDATA[Netherlands]]></category>
		<category><![CDATA[Utrecht]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1702</guid>
		<description><![CDATA[Towards the end of April I had the privilege of being invited to Utrecht University to teach a one-day workshop in Advanced Data Journalism (i.e. scary code).  Here is a quick video they made for to give people a quick idea of what ddj is about: Teaching journalists to code is never easy and impossible ... <a href="http://datamineruk.com/2013/05/21/preaching-ddj-in-the-netherlands/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p>Towards the end of April I had the privilege of being invited to <a href="http://www.cursussen.hu.nl/TotaalAanbod/Centra/Centrum%20voor%20Communicatie%20en%20Journalistiek.aspx" target="_blank">Utrecht University</a> to teach a one-day workshop in Advanced Data Journalism (i.e. scary code).  Here is a quick video they made for to give people a quick idea of what ddj is about:</p>
<p><iframe src="http://www.youtube.com/embed/3gnMZfH45Ow?rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></iframe></p>
<p>Teaching journalists to code is never easy and impossible in one day, but I wanted the participants to get a flavour of what is possible and where the journalistic pitfalls are. Being a new sphere for innovation in media, data journalism comes with its challenges. Luckily I only had one person run away after the &#8220;coding bit&#8221;. Others took up the baton and started scraping the very next day. One such person is <a href="http://www.forreporters.com/author/arno-kersten/" target="_blank">Arno Kersten</a> who works for the <a href="http://www.vvoj.nl" target="_blank">Dutch-Flemish Association of Investigative Journalists</a> and publishes the ddj website <a href="http://www.forreporters.com/" target="_blank">Medialab</a>.</p>
<p>He interviewed me after the workshop, adding me to a prestigious list of data journalism folk such as <a href="https://twitter.com/paulbradshaw" target="_blank">Paul Bradshaw</a>, <a href="https://twitter.com/mcgeoff" target="_blank">Geoff McGhee</a> and <a href="https://twitter.com/lehrennyt" target="_blank">Andrew Lehren</a>. Find me and many more <a href="http://www.forreporters.com/datahelden/" target="_blank">here</a> or watch the video:</p>
<p><iframe src="http://www.youtube.com/embed/__Fo_zoA6Ow?rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></iframe></p>
<p>I would also like to thank the <a href="http://www.meetup.com/DataScienceNL/events/113148292/" target="_blank">Data Science Meetup group in Utrecht</a> for the bottle of wine and <a href="https://twitter.com/gooffie" target="_blank">Goof van Winkel</a> for organising the workshop.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2013/05/21/preaching-ddj-in-the-netherlands/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Get Busy Building Before You Get Busy Funding</title>
		<link>http://datamineruk.com/2013/03/26/busy-building-busy-funding/</link>
		<comments>http://datamineruk.com/2013/03/26/busy-building-busy-funding/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 14:48:25 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Talks]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1696</guid>
		<description><![CDATA[I spoke at the latest HacksHackers London meetup along with three of my other OpenNews Fellows. Also speaking was John Bracken from the Knight Foundation and Bobbie Johnson who successfully started Matter on Kickstarter. Chronic blogger, Martin Belam has a good round up (see links) where he mentions my &#8216;rallying cry&#8217;: &#8220;You have no excuse ... <a href="http://datamineruk.com/2013/03/26/busy-building-busy-funding/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://meetuplondon.hackshackers.com/"><img class="alignleft" alt="" src="http://photos4.meetupstatic.com/photos/event/a/f/7/e/global_18404926.jpeg" width="150" height="100" /></a>I spoke at the latest <a href="http://meetuplondon.hackshackers.com/events/109213812/" target="_blank">HacksHackers London meetup</a> along with three of my other <a href="http://martinbelam.com/2013/hacks-hackers-knight-mozilla-fellows/" target="_blank">OpenNews Fellow</a>s. Also speaking was <a href="http://martinbelam.com/2013/hacks-hackers-john-s-bracken/" target="_blank">John Bracken</a> from the Knight Foundation and <a href="http://martinbelam.com/2013/hacks-hackers-bobbie-johnson/" target="_blank">Bobbie Johnson</a> who successfully started <a href="https://www.readmatter.com/" target="_blank">Matter</a> on <a href="http://www.kickstarter.com/" target="_blank">Kickstarter</a>. Chronic blogger, Martin Belam has a good round up (see links) where he mentions my &#8216;rallying cry&#8217;:</p>
<blockquote><p>&#8220;You have no excuse not to be making things now. You don’t need funding. You don’t need to be put in the newsroom. Start with the basics, expand it, and make something fun as a prototype.&#8221;</p></blockquote>
<p>I was always told by developers that you will only learn through making. Which is why I believe so many developers are not formerly trained in computer science. My fellows are not. I would liken it to the gap between &#8216;studying&#8217; journalism in college and what you learn by being part of a newsroom. The former is where you get the basics, the latter where you build your craft.</p>
<p>I have applied to the <a href="http://www.knightfoundation.org/prototype/" target="_blank">Knight Foundation Prototype Fund</a> to build a <a title="The Big Picture" href="http://datamineruk.com/2011/08/04/the-big-picture/" target="_blank">platform</a> and will go to Kickstarter if that fails. And if that fails I will just build in my spare time. Entrepreneurism and start-up life is about, as my fellow fellow Stijn described his journey, a series of failures. I am not afraid to fail because I will just learn and build it myself. Even if I succeed with Knight I plan on having a Kickstarter project this year. I have also applied for funding for a big data driven investigative story. I am not looking to build a body of work but a body of knowledge. You gain knowledge from trying and failing.</p>
<p>When I started an internship at CNN, all those many years ago, the first thing Richard Quest asked all of us fresh-faced interns to do was pitch a story. Here I learnt there was an art to pitching and a successful pitcher became a successful journalist. The beauty of digital journalism is that you don&#8217;t need to be in a bustling newsroom to pitch. You can pitch to your audience. You can pitch for funding. You can build your newsroom online and gauge the success.</p>
<p>In that sense, you have no excuse not to be pitching now. Not to be having ideas. You don&#8217;t need to be in a newsroom. Build a platform. I have recently bootstrapped my WordPress theme. I installed this <a href="http://www.johnparris.com/alienship/" target="_blank">free theme</a> and modified the code base. <strong>Tinker with products, tinker with code and tinker with the stories you can produce.</strong></p>
<p>I&#8217;ll let you know how my funding and project go. In the meantime I will be on a panel at the <a href="http://blogs.lse.ac.uk/polis/2012/12/11/polis-journalism-conference-april-5th-2013/" target="_blank">Polis Journalism Conference</a> speaking about trust in data journalism on 5th April. Sign up <a href="http://www.eventbrite.co.uk/event/5212005248" target="_blank">here</a>, it&#8217;s free!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2013/03/26/busy-building-busy-funding/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>To Get Into News You Have To Look Outside It</title>
		<link>http://datamineruk.com/2013/03/12/to-get-into-news-you-have-to-look-outside-it/</link>
		<comments>http://datamineruk.com/2013/03/12/to-get-into-news-you-have-to-look-outside-it/#comments</comments>
		<pubDate>Tue, 12 Mar 2013 15:57:22 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[Amanda Cox]]></category>
		<category><![CDATA[animation]]></category>
		<category><![CDATA[linear regression]]></category>
		<category><![CDATA[predicting algorithms]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[robots]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1668</guid>
		<description><![CDATA[Don&#8217;t use it for what it&#8217;s made for, understand its functionality and use that I said this over a year ago when I was first using social media tools to gather, filter and verify news. I&#8217;m sure it is one of the reasons The Guardian chose me to be its 2012 Knight-Mozilla OpenNews Fellow. And ... <a href="http://datamineruk.com/2013/03/12/to-get-into-news-you-have-to-look-outside-it/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<blockquote><p>Don&#8217;t use it for what it&#8217;s made for, understand its functionality and use that</p></blockquote>
<p>I said this over a year ago when I was first using social media tools to gather, filter and verify news. I&#8217;m sure it is one of the reasons <a href="http://www.guardian.co.uk/profile/nicola-hughes" target="_blank">The Guardian</a> chose me to be its 2012 <a href="http://www.knightfoundation.org/" target="_blank">Knight</a>-<a href="http://www.mozilla.org/foundation/" target="_blank">Mozilla</a> <a href="http://www.mozillaopennews.org/" target="_blank">OpenNews</a> Fellow. And I stand by those words just as strongly today.</p>
<p>Gone are the days when a hugely popular blog and thousands of Twitter followers would pave your way into the newsroom. They now allow you to be part of the brand, at the peripheral. You could blog as part of a news organisation, working freelance and communicating every now and again with the editor of that section. Your battle to shape the news, pitch stories and drive the agenda is made more difficult by the &#8220;new media&#8221; silo you find yourself in.</p>
<p>Blogging is a new breed of the medium. It is the same species. It functions pretty much the same. The key to extending the news sphere is cross breeding. Find a species bred for an entirely different purpose, understand its functionality and use that. For instance, I came across this graphic by <a href="https://twitter.com/nytgraphics" target="_blank">The New York Times&#8217; graphics</a> editor and resident statistician,<a href="http://www.r-bloggers.com/amanda-cox-on-how-the-new-york-times-graphics-department-uses-r/" target="_blank"> Amanda Cox</a>:</p>
<p><a href="http://graphics8.nytimes.com/images/2008/04/16/us/0416-nat-subOBAMA.jpg"><img class="aligncenter" alt="" src="http://graphics8.nytimes.com/images/2008/04/16/us/0416-nat-subOBAMA.jpg" width="650" height="1032" /></a></p>
<p>As you can tell from the topic, this was published in 2008, long before data journalism became such a trend. I love this because something like this cannot be fathomed by a journalist. Amanda Cox sees data but she also sees a tool for analysis and can shed light on what factors affected whether Obama or Clinton won a state. I like this as well because I have leant to do it! (from a free online <a href="https://class.coursera.org/dataanalysis-001/class/index" target="_blank">data analysis course run by Johns Hopkins University</a>). Using <a href="http://www.r-project.org/" target="_blank">R</a>:</p>
<pre>setwd("directory_your_data_is_in") # you need to set your working directory to where your data file is located</pre>
<pre>my_data &lt;- read.table("name_of_data_file", header = , sep =, quote = , dec = ,  ...) # for instance if you have a csv file with the first row being the column names, entries surrounded by " if they contain "," and "." as a decimal point then header=TRUE, sep=",", quote="\"", dec=".". Or you can just use read.csv()</pre>
<pre>install.packages("tree") # installing the tree package so you can access its functions
library(tree) # loading the functions into R</pre>
<pre># To predict, say which Candidate (column name) wins from all the other columns in the dataset
predict_tree &lt;- tree(Candidate ~ ., my_data)</pre>
<pre># To print out the prediction tree
plot(predict_tree)
text(predict_tree)</pre>
<pre># I think the data in the NYT example needed a lot of manipulation, turning numerical 1's and 0's into factors for example. Also finding the percentage splits on things like education involves looking at that subset as its own branch</pre>
<p>Rather than being creative with the medium why don&#8217;t we start thinking outside of the box in terms of the message. Analytical journalism is something Twitter, Facebook and Google can&#8217;t compete with. They can be faster, they can be more diverse, they can be more averting. But your friends and followers are not going to take the time and effort to analyse raw facts. That is where the value of new journalism lies.</p>
<p>Linear regression could be used, for example, to see which more strongly predicts levels of school truancy, social or educational factors. But thinking outside the box needn&#8217;t be so demanding as a university course in data analysis. Instead of a static graphic or even an exploratory interactive, why not make an animation. With <a href="http://www.html5rocks.com/en/" target="_blank">HTML5</a> video, 4G networks and <a href="http://popcornjs.org/" target="_blank">Popcorn.js</a> to tie video to web elements upon us already, web is fast becoming a playing field for broadcast journalism. Here is a brilliant example of building a data narrative through animation:</p>
<p><iframe src="http://www.youtube.com/embed/QPKKQnijnsM?rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></iframe></p>
<p>For those who play with code, why not <a href="http://blogs.journalism.co.uk/2013/03/08/podcast-robot-reporting-a-look-at-the-la-times-data-desk/" target="_blank">build robots to help you report</a>? You can build a web scraper to monitor a data source and tweet/email you if certain conditions are met. The possibilities are endless.</p>
<p>If you are interested in news and journalism look outside the reporting sphere, gain analytical skills and ask yourself &#8220;how does it function?&#8221; and use that.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2013/03/12/to-get-into-news-you-have-to-look-outside-it/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DataMinerUK: What I Do And How</title>
		<link>http://datamineruk.com/2013/03/07/datamineruk-what-i-do-and-how/</link>
		<comments>http://datamineruk.com/2013/03/07/datamineruk-what-i-do-and-how/#comments</comments>
		<pubDate>Thu, 07 Mar 2013 15:24:10 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[My Data Journey]]></category>
		<category><![CDATA[Infinite Interns]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1640</guid>
		<description><![CDATA[What I Do Data journalism has sprawled into an all-encompassing term meaning new digital journalism excluding social media/community. In other words, news generated from a newsroom which doesn&#8217;t solely consist of text print. However, there is a wide variety of fields interwoven and interrelated into the term data journalism. So when asked about getting into data ... <a href="http://datamineruk.com/2013/03/07/datamineruk-what-i-do-and-how/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p><strong>What I Do</strong></p>
<p>Data journalism has sprawled into an all-encompassing term meaning new digital journalism excluding social media/community. In other words, news generated from a newsroom which doesn&#8217;t solely consist of text print. However, there is a wide variety of fields interwoven and interrelated into the term data journalism.</p>
<p>So when asked about getting into data journalism, one has to clarify which branch.  The first split from the trunk has to be the broader areas of investigative data journalsim and data visualisation. As newsrooms moved from print to digital, few of them had the chance or expertise to experiment and so the whole tree fell under the data journalism canopy. Thus they were made of fullstack developers (most were designers and developers with an editorial leader, such as the NewYork Times team run by Aron Pilhofer).</p>
<p>Now <a href="http://seansposito.com/2013/03/06/has-data-journalism-reached-a-tipping-point/" target="_blank">data journalism is reaching its tipping point</a>. A data journalist can no longer dapple with Excel, Google Fusion tables, HTML and JavaScript and expect to be in the game. I was not lucky enough to be able to attend this year&#8217;s National Institute for Computer Assisted Reporting (NICAR) conference, <a href="https://twitter.com/search?q=%23NICAR13&amp;src=hash" target="_blank">#NICAR13</a>, but following the twitter feed has revealed this shift.</p>
<blockquote class="twitter-tweet"><p>Seeing resumes of people who can do a little of everything, but not having specialty or small set of specialities worries @<a href="https://twitter.com/pilhofer">pilhofer</a> <a href="https://twitter.com/search/%23NICAR13">#NICAR13</a></p>
<p>— Greg Linch (@greglinch) <a href="https://twitter.com/greglinch/status/308240942215864320">March 3, 2013</a></p></blockquote>
<p>I specialise in the backend of data journalism: investigations. I work to be the primary source of a story, having found it in data. As such my skills lean less towards design and JavaScript and more towards scraping, databases and statistics.</p>
<p><strong><script charset="utf-8" type="text/javascript" src="//platform.twitter.com/widgets.js" async=""></script>What I Need To Do It</strong></p>
<p>There is no clear way to get into data journalism. Firstly you must decide what area of this wide field you are interested in. You must find where your passion and curiosity lies. You can never get to a level of training where you feel you can satisfactorily do your job. That&#8217;s the difference between journalism and data journalism. The web evolves in 3 month cycles and so you are always playing catch up, whether that&#8217;s retrieving data or making the latest unique interactive.</p>
<p>The most important thing you need is drive. Because it is hard. I have learnt an object oriented programming language primarily for scraping, Python, a database language MySQL (ElasticSearch if the data is very large) and an analysis software language, R. I want to get at a story, I want to find an interesting pattern, a tantalising trail.  For that, I need to be able to get, clean, sort and analyse data in order to understand it.</p>
<blockquote class="twitter-tweet"><p>The problem of &#8220;what skills do you need to know?&#8221; depends on what you want to do, says @<a href="https://twitter.com/markhorvit">markhorvit</a>. Understanding data is big part. <a href="https://twitter.com/search/%23NICAR13">#NICAR13</a></p>
<p>— Greg Linch (@greglinch) <a href="https://twitter.com/greglinch/status/308240548945334272">March 3, 2013</a></p></blockquote>
<p>But it&#8217;s not just the the skills that are an important part of the journey. That&#8217;s just what you need to get ready. It&#8217;s what you learn along the way that is your greatest asset.</p>
<p><strong>How I Do It</strong></p>
<p>I work in a virtual world. Literally. The only software I have installed on my machine are <a href="https://www.virtualbox.org/" target="_blank">VirtualBox</a> and <a href="http://www.vagrantup.com/" target="_blank">Vagrant</a>. I create a virtual machine inside my machine. I have blueprints for many virtual machines. Each machine has a different function i.e. a different piece of software installed. So to perform a function such as fetching the data or cleaning it or analysing it, I have a brand new environment which can be recreated on any computer.</p>
<p>I call these environments &#8220;<a href="https://github.com/DataMinerUK/infinite-interns" target="_blank">Infinite Interns</a>&#8220;. In order to help journalists see the possibilities of what I do, I tell then to think about what they could accomplish if they had an infinite amount of interns. Because that&#8217;s what code is.  Here are a couple of slides about my <a href="https://github.com/DataMinerUK/infinite-interns" target="_blank">Infinite Interns</a> system:</p>
<p><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/16988684" height="356" width="427" allowfullscreen="" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe></p>
<div style="margin-bottom: 5px;"><strong> <a title="Infinite interns" href="http://www.slideshare.net/DataMinerUK/infinite-interns" target="_blank">Infinite interns</a> </strong> from <strong><a href="http://www.slideshare.net/DataMinerUK" target="_blank">DataMinerUK</a></strong></div>
<p>To use this system I have another repository called <a href="https://github.com/DataMinerUK/skel" target="_blank">Skel</a>. This is the skeleton layout for all my data driven projects. Every time I start a new project I download Skel. This also has folders for the various components of my investigation, the source data, the transformed data, the analysis, etc. With each intern I bring to life I transform the data and put the results on my own machine. After that I kill it. A benefit of having virtual interns is that no labour laws apply!</p>
<p>All my processes are coded, as are my environments, so I can write a script that will automatically run the investigation from start to finish in one command. In that way it is completely transparent and reproducible.</p>
<p><strong>Why I Do It</strong></p>
<p>It has taken me a long time to develop this process. It&#8217;s built on what I&#8217;ve come to learn as an <a href="http://www.mozillaopennews.org/" target="_blank">OpenNews</a> Fellow at <a href="http://www.guardian.co.uk/profile/nicola-hughes" target="_blank">The Guardian</a>. I use code to do journalism, to find and report stories. Thus my process of development is not built around agile or responsiveness or all the other ways developer teams are run. My process is built around transparency and reproducibility. It is meant to stand up in court.</p>
<p>I practice this on every scale. For instance, I have a three step process when I scrape. For a large enough dataset, there will be a search function on a webpage which will pull out sections of the database. I write a scraper to pull out every section, paginate through the entries and store the URLs for each of the individual entries. Then I write another scraper to go to all the URLs and store the HTML for the page. All of it. Only after I have done that do I write a third and final scraper to collect all the necessary details and store it in a structured database.</p>
<p>Having recently taught this process to a group of developers at <a href="http://www.aljazeera.com/" target="_blank">Al Jazeera English</a>, one of them asked me why I have this second interim step. Surely it wastes time and computing space. You need it just as a journalist needs their notes. When you scrape a website you are fetching information from a site that has fetched it from a server. The entity you are investigating has access to that server and can change that information. It will change on the site and what proof do you have that you did not fabricate the data or retrieve it incorrectly. If you have the HTML you have proof of what the website retrieved at that point in time.</p>
<p>I am a journalist by training and am learning developer skills. But I am creating my own processes based on what I need as a journalist not as a developer. It is an ever evolving process but that insight is the result of two years delving into the world of data journalism. As long as you have an insight of your own, a unique way of getting done what you need to get done, then you are a data journalist.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2013/03/07/datamineruk-what-i-do-and-how/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Journalism &#8211; Where Is It All Going?</title>
		<link>http://datamineruk.com/2013/01/29/data-journalism-where-is-it-all-going/</link>
		<comments>http://datamineruk.com/2013/01/29/data-journalism-where-is-it-all-going/#comments</comments>
		<pubDate>Tue, 29 Jan 2013 17:08:11 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[Alex Howard]]></category>
		<category><![CDATA[data journalism]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1631</guid>
		<description><![CDATA[Firstly I&#8217;m going nowhere as I am having operations on my feet. The Internet, however, is my only freedom. Painkillers are my cage. Late last year there was a call out by Alex Howard from O&#8217;Reilly Radar. In it he asks: &#8230;how many data journalists are working today? How many will be needed? What are ... <a href="http://datamineruk.com/2013/01/29/data-journalism-where-is-it-all-going/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p>Firstly I&#8217;m going nowhere as I am having operations on my feet. The Internet, however, is my only freedom. Painkillers are my cage.</p>
<p>Late last year there was a call out by <a href="http://twitter.com/digiphile" target="_blank">Alex Howard</a> from <a href="http://radar.oreilly.com/2012/11/investigating-data-journalism.html" target="_blank">O&#8217;Reilly Radar</a>. In it he asks:</p>
<blockquote><p>&#8230;how many data journalists are working today? How many will be needed? What are the primary tools they rely upon now? What will they need in 2013? Who are the leaders or primary drivers in the area? What are the most notable projects? What organizations are embracing data journalism, and why?</p></blockquote>
<p>I spent last year as a <a href="http://mozillaopennews.org/fellowships/" target="_blank">Knight-Mozilla OpenNews Fellow</a> at <a href="http://www.guardian.co.uk/profile/nicola-hughes" target="_blank">The Guardian</a>, working within the <a href="http://www.guardian.co.uk/profile/guardian-interactive-department" target="_blank">Interactive Team</a>, in the hope of getting where a journalist in the age of big data needs to go. It was not just about learning to code, which I started a year pervious at <a href="https://scraperwiki.com/profiles/NicolaHughes/" target="_blank">ScraperWiki</a>, but understanding the digital story cycle, the editorial decisions, and the internal politics. Most importantly, my journey was and still is the road less travelled by. I had to completely retrain myself. With no syllabus, no certificate and no notion of what would be needed to don the robes of the data journalist. So here is my two cents.</p>
<p><strong>How many data journalists are working today?</strong></p>
<p>That depends on your definition of a data journalist. After the dawning of the era of the citizen journalist we no longer have a clear definition of journalist! This is a moot point but I would say it is more of a legal point. Rather than egotistically heralding the bastions of truthfulness, impartiality and objectivity; I would say a journalist is someone who can argue a case for public interest defence in court. A data journalist is therefore anyone who can argue a case for public interest if he/she is brought to court upon producing content for the web.</p>
<p>Typically a journalist could argue a case for public interest upon publication and these were traditionally of the form of print, radio and TV. And traditionally , the process toward publication began with collecting documents and documenting the process which transformed these documents into its resultant mediated format. Now, documents are data. We live in a digitized world. Therefore all journalists are becoming data journalists. It&#8217;s just journalism adapting to new media going out and new media coming in.</p>
<p>To answer the question (whilst not really answering), the number of data journalists today are the number of journalists working in an organization which will survive the digital transition and find a stable business model, and who will not be fired before these things happen. The number of data driven investigative journalists will be one or two per survivor successful enough to afford specialist desks. I make this distinction because all documents are data and all journalism is built atop documents, however journalism in its simplest form involves mediating already processed data whereas data specialists will be processing and analyzing raw data in order to get to the stories within.</p>
<p><strong>How many will be needed?</strong></p>
<p>That depends on who survives the digital cull. Journalists will need produce for the web first. Once news organizations settle on a content management system I believe there would be very little computer skills needed. Training to use the CMS would be more than adequate. I believe that the big stories, the special features and editorials are going to need that extra pizzaz. As such, small investigations coupled with video, graphic and/or interactive features will become desirable, especially amongst those editors who want to boast at conferences.</p>
<p>To create a long form digital piece you of course need a multi-skilled team. For these teams to work efficiently requires each member to be familiar with all the processes and to have a vast range of skills. So a most specialized data journalist with coding skills will be desirable but only at a select few organizations. But I believe supply will be very low as few journalists are interested in this route and even few institutions teach computer-assisted-reporting to an adequate standard for the job requirements.</p>
<p><strong>What are the primary tools they rely upon now?</strong></p>
<p>Currently there is no standard and tools vary according to institutions. The very basics are spreadsheets and fusions tables. Web standards are HTML and CSS. The more data intensive (and ambitious) interactive desks work with R. This is a legacy of training as statisticians and data scientists coming out of University should be R trained. Also R is free. Assume everything they use will need to be free. One object oriented programming language, Ruby or Python, is used. Some say either is fine but others such as ProPublica state they are Ruby houses. If a team dictates which one will be used as standard, this is most probably a good sign. It means they are sharing code and plan on reusing it.</p>
<p>The trend I think you will see this year and for years to come is an evolution towards JavaScript and HTML5 houses. These will be the hard skills looked for. Simply, everything will more towards being browser native so that there can be quality control across all devices. Responsive design rather than singular apps. Third party tools will be phased out very quickly the higher up you climb the digital ladder. The web changes too quickly to rely on them. The tools that will be relied upon most will be people-based: creativity, problem-solving, ingenuity, design architecture, accuracy, etc. All the things for which we are yet to make an algorithm.</p>
<p><strong>Who are the leaders or primary drivers in the area? What are the most notable projects? What organizations are embracing data journalism, and why?</strong></p>
<p>These are all sort of the same question. The primary drivers are the tje organizations embracing data journalism and they are creating the most notable projects. The Guardian, The New York Times, The Boston Globe, Texas Tribune, BBC, ProPublica, etc. The majority of the big players in the US are in the game and playing mostly to swap and/or steal people between them rather then fostering their own personnel or attempting a different approach. The UK has a couple of major players and some crouching at the starting line. The US has a much longer history of CAR and a more go-get-&#8217;em attitude that has allowed <a href="https://twitter.com/pilhofer" target="_blank">Aron Pilhofer</a> and <a href="https://twitter.com/kleinmatic" target="_blank">Scott Klein</a> to adopt their own team management and structure.</p>
<p>Even though &#8220;data journalism&#8221; is on the rise (as a term anyway) getting to where Aron and Scott are is extremely difficult, being successful at it even more so. Aron and Scott were given the opportunity to form a team when news organizations were going through a frantic burst of evolution, straining to cling on to life. ProPublica is one fo the more elegant species to arise from this turbulent time. Now, those who feel they survived the crunch are battening down the hatches and hoping to stave off the winter by being conservative. Because managers and directors have now heard the terms &#8220;big data&#8221;, &#8220;interactives&#8221;, &#8220;data journalism&#8221; they feel someone has figured it out and all they need to do is steal them or if that fails find a cheaper copy.</p>
<p>Sadly, most news organizations are embracing data journalism because The New York Times has. Managers feel they should be sprouting buzz words at meetings and having coffee with Google reps. They need to be seen to be embracing the future. Even if they have no idea what that entails. Many send editors to conferences but few send journalists to training courses. Most want to find out what free tools they can use, who they can partner with and who is the next Twitter. At the core of digital journalism development is incentive. It still doesn&#8217;t make money but it can make careers. It has to stop being about the newsroom &#8216;ooooh&#8217; factor and more about getting the right story.</p>
<p><strong>What will they need in 2013?</strong></p>
<p>Experience. They will need people attempting these projects on a small scale, therefore they need non-developers to have some skills and a thirst to learn new ones. They need more people and more models in order to experiment as &#8220;data journalism&#8221; is not a solid thing as yet. They need the internal structure (and a stable one) in order to allow the team the time it takes to mature. They need the to money to pay developers, good ones. They need the internal structure of their institution to undertand what it is they are doing and to adopt the same fluidity.</p>
<p>They need to keep experimenting.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2013/01/29/data-journalism-where-is-it-all-going/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hacktivities at the Mozilla Festival 2012</title>
		<link>http://datamineruk.com/2012/11/13/hacktivities-at-the-mozilla-festival-2012/</link>
		<comments>http://datamineruk.com/2012/11/13/hacktivities-at-the-mozilla-festival-2012/#comments</comments>
		<pubDate>Tue, 13 Nov 2012 17:20:16 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[My Data Journey]]></category>
		<category><![CDATA[OpenNews]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Popcorn Maker]]></category>
		<category><![CDATA[Popcorn.js]]></category>
		<category><![CDATA[Thimble]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1614</guid>
		<description><![CDATA[Last weekend (9-11 November 2012) saw the second Mozilla Festival. Over a thousand people came through the doors of Ravensbourne College, packing 9 floors and hacking to their hearts content. Digital journalism super stars Aron Pilhofer, Brian Boyer, Scott Klein, Miranda Mulligan and the Guardian Interactive Team were in attendance. I optimistically decided to run two ... <a href="http://datamineruk.com/2012/11/13/hacktivities-at-the-mozilla-festival-2012/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p>Last weekend (9-11 November 2012) saw the second <a href="http://lanyrd.com/2012/mozilla-festival/" target="_blank">Mozilla Festival</a>. Over a thousand people came through the doors of Ravensbourne College, packing 9 floors and hacking to their hearts content. Digital journalism super stars <a href="https://twitter.com/pilhofer" target="_blank">Aron Pilhofer</a>, <a href="https://twitter.com/brianboyer" target="_blank">Brian Boyer</a>, <a href="https://twitter.com/kleinmatic" target="_blank">Scott Klein</a>, <a href="https://twitter.com/mirandamulligan" target="_blank">Miranda Mulligan</a> and the <a href="http://www.guardian.co.uk/profile/guardian-interactive-department" target="_blank">Guardian Interactive Team</a> were in attendance.</p>
<p>I optimistically decided to run two learning labs. Both teaching non-coders to hack and hackers some media literacy hacktivities and interactive video with Popcorn.js. So there is something for everyone! These will be put into a <a href="http://zythepsary.com/mozilla/hacktivity/OpenNewsPrototype/index_thimble_opennews.html" target="_blank">hacktivity kit</a> but until I find a home for making my workshops into online lessons I&#8217;ll be putting all the links right here.</p>
<p>First a word of note, if you&#8217;re attending my workshops or learning labs bring a laptop and not a tablet. I&#8217;m very hands-on!</p>
<p><strong>HTML for Journalists</strong></p>
<p>Using Mozilla&#8217;s new webmaker tool, <a href="https://thimble.webmaker.org/en-US/" target="_blank">Thimble</a>, you can code and see what the browser sees. In this hacktivity you markup and style a news article, learning the bascis of HTML and CSS. For those of you already web savvy there is a media literacy game suggested. <a href="https://docs.google.com/document/pub?id=1zDvfk3z6Imk2eUYBQlZYfO3tBC4pmxrSt9TCSThBW7I" target="_blank">Try it out</a>!</p>
<p><iframe src="https://docs.google.com/document/pub?id=1zDvfk3z6Imk2eUYBQlZYfO3tBC4pmxrSt9TCSThBW7I&amp;embedded=true" width="600" height="240"></iframe></p>
<p><strong>Location-based Storytelling Using Popcorn Maker and Popcorn.js</strong></p>
<p>This was really fun to make and teach. Popcorn is the project that is going to get me to further hone my JavaScript skills. It is the most applicable webmaker project to journalism. Here non-coders use <a href="https://popcorn.webmaker.org/" target="_blank">Popcorn Maker</a> to replicated a <a href="http://www.bbc.co.uk/news/uk-18798942" target="_blank">BBC Interactive</a> and those who code for the web can recreate it easily using <a href="http://popcornjs.org/" target="_blank">Popcorn.js</a>. Choose a track and <a href="https://docs.google.com/document/pub?id=1bKT5D_tgLrtbnlXvTS4t8ciHNsH-BdYD46Hr1uDj-ZI" target="_blank">try it for yourself</a>. For non-coders I suggest you do the HTML course first, go through the <a href="http://www.codecademy.com/tracks/web" target="_blank">web fundamentals</a> and <a href="http://www.codecademy.com/tracks/javascript" target="_blank">JavaScript fundamentals</a> track on CodeCademy and then have a crack at the Popcorn.js assignment!</p>
<p><iframe src="https://docs.google.com/document/pub?id=1bKT5D_tgLrtbnlXvTS4t8ciHNsH-BdYD46Hr1uDj-ZI&amp;embedded=true" width="600" height="240"></iframe></p>
<p>&nbsp;</p>
<p>Besides running the above workshops, which was great fun and so many journalists attended, all the OpenNews 2012 fellows also gave a one minute presentation on something they got up to this year. Watch it <a href="http://cf.cdn.vid.ly/6o5n8f/mp4.mp4" target="_blank">here</a> (25 minutes in). The <a href="http://www.mozillaopennews.org/fellowships/2013meet.html" target="_blank">2013 fellows</a> were also announced. It was great to meet almost all of them and I look forward to following their amazing journey next year. It&#8217;s mind-blowing to think that such talented and diverse people are offering their skills to newsrooms around the globe. Counter to what most people would think, I think now is an incredible time to be in journalism. People are telling their own stories, instantly and to the world. The fabric with which media people now work with is truly intricate and interwoven through the public sphere. Now we should be experimenting with new types of stories that can be told.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2012/11/13/hacktivities-at-the-mozilla-festival-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://cf.cdn.vid.ly/6o5n8f/mp4.mp4" length="422116178" type="video/mp4" />
		</item>
		<item>
		<title>What News Organizations Can Learn From Tech Orgainzations</title>
		<link>http://datamineruk.com/2012/10/23/what-news-organizations-can-learn-from-tech-orgainzations/</link>
		<comments>http://datamineruk.com/2012/10/23/what-news-organizations-can-learn-from-tech-orgainzations/#comments</comments>
		<pubDate>Tue, 23 Oct 2012 15:40:47 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[My Data Journey]]></category>
		<category><![CDATA[OpenNews]]></category>
		<category><![CDATA[Mozilla]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1591</guid>
		<description><![CDATA[Now is the time to experiment. The second decade of the second millennium is all about crises. The stock market, the Arab Spring and the EU. The characters of history will be politicians for sure, but the celebrities, the cultural icon for these harsh and unsure times is the billionaire entrepreneur. Shaking things up, moving things around ... <a href="http://datamineruk.com/2012/10/23/what-news-organizations-can-learn-from-tech-orgainzations/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p>Now is the time to experiment. The second decade of the second millennium is all about crises. The stock market, the Arab Spring and the EU. The characters of history will be politicians for sure, but the celebrities, the cultural icon for these harsh and unsure times is the billionaire entrepreneur.</p>
<p>Shaking things up, moving things around and exploring new territory is how we have survived as a species for so long and this crisis will stand to prove that those who adapt will survive. Nowhere is this more evident than the tech industry. Nowhere is this easier to do than the web. The web is a new ecosystem to explore. It is the new medium not just for monetary transactions but for social internactions. So why is the move from print to web proving so difficult for the news industry?</p>
<p><a href="http://www.mozilla.org/en-US/webmaker/"><img class="wp-image-1594 aligncenter" title="Mozilla-Webmaker-logo" src="http://datamineruk.com/wp-content/uploads/2012/10/Mozilla-Webmaker-logo1.png" alt="" width="560" height="186" /></a></p>
<p>I have been fortunate in the last two years to get a glimpse at the greener side of the road. As part of the <a href="http://www.mozillaopennews.org/" target="_blank">OpenNews</a> programme developers are put into news organizations. I, on the other hand, was in news. I trained in news and worked at Channel4, BBC and CNN International. For me, the exciting new prospect was not just to work with developers to hone my data skills but to be part of <a href="http://www.mozilla.org/en-US/" target="_blank">Mozilla</a>. I didn&#8217;t realize how much a part I would be until last month where I found myself at the All-Hands (AGM) with 50 other Mozillians.</p>
<p>It was a real eye-opener and a real privilege to be there. It was my chance to learn more about Mozilla and the projects they are undertaking. Before I tell you about those I want to list my observations on the way Mozilla is run which I think reflects the web-based tech industry as a whole. Mozilla are:</p>
<ul>
<li><strong>Inclusive of every employee</strong>: everyone had a say, everyone could get involved and contribute to decision-making</li>
<li><strong>Promoters of creativity</strong>: success on the web is built on ideas and ideas can come from anywhere and anyone</li>
<li><strong>Hugely varied in their employees backgrounds and skills</strong>: from designers to teachers to musicians, there was a wide range of personalities</li>
<li><strong>Risk-takers</strong>: they see themselves not just as making fun things for the web but as taking on the likes of Facebook and Youtube</li>
<li><strong>Concerned about sustainability</strong>: even though the foundation has funding they are still looking at how they can support their work in perpetuity</li>
<li><strong>Mission-driven</strong>: every project must have a purpose and this must fit into the Mozilla mission statement of an open and inclusive web</li>
<li><strong>Moving to coherency</strong>: for their message to be loud and clear they are creating threads between all their projects</li>
</ul>
<p>The modern newsroom is still faction-based between digital and editorial. There is still too much of a top-down approach to management. Your product is still your content, no matter what medium it is on. If that content is ripped from wire copy it is information when contained in a touch and take newspaper but has very little value on a web of social and global links. People need to touch a paper and people need to be touched by the web. There needs to be creativity (and humanity) in the creation and presentation of content. The medium is the message.</p>
<p>For the audience to feel a connect to the news product there needs to be a real connection in its creation. News cannot be produced in a factory and it cannot be produce in code (the aggregators will lose their audience to a truly digital news platform). There needs to be a passion for journalism, real journalism in the newsroom for a news organization to survive. In that sense, hack days should not just be for developers, they should be for journalists also; to hunt down the stories that have passed under their nose in the process of producing copy. Developers and journalists should have access to as wide a range of skills as possible to make these ideas reality.</p>
<p>Managers should take risks at news organizations. Now is the time to do things differently as your product has to do its job differently. Be agile and be lean, just make sure you have learning metrics so you know the results of the changes you make. Turn to your developers, turn to your journalists and turn to your supporters. You no longer have readers or an audience or even users. You have supporters. They can be supporting you financially and in other ways. Your product is not a chair, it serves no definitive function. It is ideological. Your supporters choose to support you because they want you to continue working to your ethos and principles. In that sense, you do not have a brand. You have a mission. Managers in newsrooms speak of brand far too much, and building products around said brand. No. You have a mission and you find ways for your supporters to connect with and further their support in your mission.</p>
<p>On the surface, I want news organizations to be more like tech organizations because I want 20% time, I want the high spec computers and great internet connection, I want the cereal bar and soft drink filled snack bar, I want the fun events and travel. But I think it goes deeper than this. I think they should adopt the ideology.</p>
<p><em>The <a href="http://mozillafestival.org/" target="_blank">Mozilla Festival 2012</a> will be in London 9-11 November. I will be running two workshops in the journalism track, Location-based Storytelling using Popcorn.js and Popcorn Maker (yet to be released but watch <a href="http://www.youtube.com/watch?feature=player_embedded&amp;v=641aB1Dv1DY" target="_blank">this video</a>) and HTML for Journalists using Thimble. <a href="https://donate.mozilla.org/page/contribute/mozfest2012-registration" target="_blank">Do come and join me</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2012/10/23/what-news-organizations-can-learn-from-tech-orgainzations/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Scraping Contracts, Digging Dfid</title>
		<link>http://datamineruk.com/2012/09/24/scraping-contracts-digging-dfid/</link>
		<comments>http://datamineruk.com/2012/09/24/scraping-contracts-digging-dfid/#comments</comments>
		<pubDate>Mon, 24 Sep 2012 15:17:04 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[My Data Journey]]></category>
		<category><![CDATA[News Story]]></category>
		<category><![CDATA[OpenNews]]></category>
		<category><![CDATA[contracts]]></category>
		<category><![CDATA[data driven journalism]]></category>
		<category><![CDATA[Department for International Development]]></category>
		<category><![CDATA[Guardian DataBlog]]></category>
		<category><![CDATA[The Guardian]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1577</guid>
		<description><![CDATA[In an article on the Guardian Data Blog Claire Provost outlined how the recent furore over consultancy spending by the Department for International Development (Dfid) should not be about turning the aid tap off but about making aid work for the donor country. One way to promote development in donor countries is to untie aid, to ... <a href="http://datamineruk.com/2012/09/24/scraping-contracts-digging-dfid/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p><a href="https://scraperwiki.com/scrapers/dfid-contracts-info/"><img class="alignleft  wp-image-1580" title="Nicola Hughes : DFID-Contracts-Info | ScraperWiki" src="http://datamineruk.com/wp-content/uploads/2012/09/Nicola-Hughes-DFID-Contracts-Info-ScraperWiki-300x208.png" alt="" width="243" height="168" /></a>In an <a href="http://www.guardian.co.uk/global-development/datablog/2012/sep/21/why-is-uk-aid-going-to-uk-companies" target="_blank">article on the Guardian Data Blog</a> <a href="http://www.guardian.co.uk/profile/claire-provost" target="_blank">Claire Provost</a> outlined how the recent furore over consultancy spending by the Department for International Development (Dfid) should not be about turning the aid tap off but about making aid work for the donor country. One way to promote development in donor countries is to untie aid, to allow companies and consultancies in developing countries to win contracts for work at home. In that way, grow local industry and promote local expertise.</p>
<p>To look at this angle I scraped the <a href="http://www.contractsfinder.businesslink.gov.uk/Search%20Contracts/Search%20Contracts%20Results.aspx?site=1000&amp;lang=en&amp;sc=9d049372-a4c2-46af-bb24-2f7e132aa5ce&amp;osc=db8f6f68-72d4-4204-8efb-57ceb4df1372&amp;rb=1&amp;ctlPageSize_pagesize=200" target="_blank">Dfid contracts from contracts finder</a> and looked at which contracts were won by UK companies. The <a href="http://www.guardian.co.uk/global-development/datablog/2012/sep/21/why-is-uk-aid-going-to-uk-companies" target="_blank">article</a> had the data but <a href="https://scraperwiki.com" target="_blank">ScraperWiki</a> has the code for any of you interested in digging up contracts.</p>
<p>Firstly, you should scrape all the links to the individual contracts from the search result page. <a href="https://scraperwiki.com/scrapers/dfid-contracts/" target="_blank">Here</a> is the one for Dfid. Click on &#8220;Copy&#8221; to get your own and change the &#8220;search_page&#8221; variable to the URL of your search. To make sure you get all the URLs change the page size in the URL to make sure they are all on one page.</p>
<p>Next, go into each URL by attaching the data from your search results scraper into a new scraper which extracts the HTML and pulls out the necessary information. <a href="https://scraperwiki.com/scrapers/dfid-contracts-info/" target="_blank">Here</a> is the one for Dfid. I have used the HTML scraping library <a href="http://www.crummy.com/software/BeautifulSoup/" target="_blank">BeautifulSoup</a>. You can find the documentation <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">here</a>.</p>
<p>So take a look at the <a href="http://www.contractsfinder.businesslink.gov.uk/Search%20Contracts/Search%20Contracts%20Results.aspx?site=1000&amp;lang=en&amp;sc=9d049372-a4c2-46af-bb24-2f7e132aa5ce&amp;osc=db8f6f68-72d4-4204-8efb-57ceb4df1372&amp;rb=1&amp;ctlPageSize_pagesize=200" target="_blank">source</a>, take a look at the <a href="view-source:https://scraperwiki.com/editor/raw/dfid-contracts-info" target="_blank">code</a> and take a look at the <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">documentation</a>. Open data and open up the news.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2012/09/24/scraping-contracts-digging-dfid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Journalism In Argentina</title>
		<link>http://datamineruk.com/2012/09/17/data-journalism-in-argentina/</link>
		<comments>http://datamineruk.com/2012/09/17/data-journalism-in-argentina/#comments</comments>
		<pubDate>Mon, 17 Sep 2012 14:54:29 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[My Data Journey]]></category>
		<category><![CDATA[OpenNews]]></category>
		<category><![CDATA[Buenos Aires]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[HacksHackers]]></category>
		<category><![CDATA[La Nacion]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1557</guid>
		<description><![CDATA[Being immersed in a passion, an upcoming field, a new area of exploration is of course exhilarating. But living in a bubble of code, data and journalism can lead you to adopt certain assumptions and to fall into naive paradigms. This is often geo-located. I, like most in the West, tend to look to the ... <a href="http://datamineruk.com/2012/09/17/data-journalism-in-argentina/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.meetup.com/HacksHackersBA/photos/10496932/154649642/"><img class="size-full wp-image-1561 aligncenter" title="HHBA" src="http://datamineruk.com/wp-content/uploads/2012/09/600_154649642.jpeg" alt="" width="600" height="450" /></a></p>
<p>Being immersed in a passion, an upcoming field, a new area of exploration is of course exhilarating. But living in a bubble of code, data and journalism can lead you to adopt certain assumptions and to fall into naive paradigms. This is often geo-located. I, like most in the West, tend to look to the West for innovation and progress. Even though I am not completely Western it is easy to get enveloped in your immediate online community and not those occupying the next node on the web.</p>
<p>This became apparent to me on my recent trip to Argentina for <a href="http://www.meetup.com/HacksHackersBA/" target="_blank">HacksHackers Buenos Aires</a>. Not only is the journalism-coding community thriving in this bustling South American city but the hub of activity is something every newsroom developer and webmaker journalist in the West should be envious of. They organized a star-studded lineup of new media composers for talks and workshops. A three day media party with over 700 attendees. If that wasn&#8217;t impressive enough, they had live translators so that both Spanish and English speakers could attend.</p>
<p><a href="http://www.meetup.com/HacksHackersBA/photos/10484642/#154313472"><img class="alignnone size-full wp-image-1562" title="Guardian_HHBA" src="http://datamineruk.com/wp-content/uploads/2012/09/600_154313472.jpeg" alt="" width="600" height="400" /></a></p>
<p>I was included in the keynote and workshop run my the members of the Interactive Team. What I said was as follows:</p>
<blockquote><p>Journalists used to work with information in print. Now we work with data in the digital age. Now &#8220;we&#8221; means not just the journalist with pen, camera and microphone but anyone with a phone and access to the internet. Now we all aggregate, curate and cultivate in the age of big data in the hope that not one person can dictate.</p>
<p>Now philanthropical bodies like <a href="http://www.knightfoundation.org/what-we-fund/innovating-media" target="_blank">Knight</a> and campaigning foundations like <a href="http://www.mozillaopennews.org/" target="_blank">Mozilla</a> are enabling news organisations to openly embrace data driven investigative journalism by funding projects, training and education centres. Even giving headstrong idealists the opportunity to work with the new multi-skilled teams being fostered in newsrooms around the globe.</p>
<p>Because what newsrooms typically have no longer works. It does not work for collaboration, for visualisation, or for big data. To be digital, news organisations have to now move at the speed of web. And with the advantages of legacy come the disadvantages of rigidity.</p>
<p>So how do we engage with data? How do we move forward in our understanding of the typical news story? How do we pitch a story without a headline until the majority of the work has been done? How do we decide how to tell the story before we have it? How come it is already happening all over the world simultaneously?</p>
<p>How do we strive for data integrity when the structure we know is the sentence and the paragraph? How can we ensure accuracy when we only have one source, the data? How can we interrogate data on a scale that cannot be consumed by a human being? And how can we find stories in data whilst upholding the cornerstones of impartiality, accuracy and fairness?</p>
<p>I don&#8217;t know the answers but I do know that to do all of this we need tools. We all need to collectively and openly share ideas, data and code. We are no longer news-makers <strong>on</strong> the web, we are news-makers <strong>of</strong> the web. And I have had the great fortune of seeing <a href="http://www.guardian.co.uk/profile/guardian-interactive-department" target="_blank">The Guardian team</a> and indeed the news industry tackle these challenges.</p></blockquote>
<p>This went down well with the crowd but what I failed to communicate was how impressed I was with them. Before the media party kicked off I had the great fortune of meeting up with the data team at <a href="https://twitter.com/LNdata" target="_blank">La Nacion</a>. Not only were they present and active participants but they brought along an entire university class whose students are taught on campus and in the newsroom. What a brilliant idea!</p>
<p>Another inspiring concept incubated by the data team is that of the digital journalist notebook. The government of Argentina publish reports of spending, expenses, contract awards, etc in paper bulletins which are available online as PDFs. Each region has it&#8217;s own take on the general layout. The team scrape all these PDFs and have a search for the contents of the documents.</p>
<p>In this way they have made all government reports into a digital library where, using their journalistic hat, they can connect who is who and who gets what. Hard-hitting investigative stories have already emerged. With data &#8220;more open&#8221; in the West, newsrooms there shold be taking a leaf from their digital notebook.</p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2012/09/17/data-journalism-in-argentina/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Layout For All Your Data Driven Journalism Projects</title>
		<link>http://datamineruk.com/2012/08/16/a-layout-for-all-your-data-driven-journalism-projects/</link>
		<comments>http://datamineruk.com/2012/08/16/a-layout-for-all-your-data-driven-journalism-projects/#comments</comments>
		<pubDate>Thu, 16 Aug 2012 15:23:22 +0000</pubDate>
		<dc:creator>Nicola Hughes</dc:creator>
				<category><![CDATA[Data Journalism]]></category>
		<category><![CDATA[My Data Journey]]></category>
		<category><![CDATA[OpenNews]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[GitHub]]></category>

		<guid isPermaLink="false">http://datamineruk.com/?p=1547</guid>
		<description><![CDATA[I do the hard work so you don&#8217;t have to! In fact, I need to do it for me. If I am to find my scripts and organize them, I need to structure my project at the start and keep to a standard layout for all my projects. So I&#8217;ve made a GitHub repository called skel ... <a href="http://datamineruk.com/2012/08/16/a-layout-for-all-your-data-driven-journalism-projects/">Continue Reading  &#8594;</a>]]></description>
				<content:encoded><![CDATA[<p>I do the hard work so you don&#8217;t have to! In fact, I need to do it for me. If I am to find my scripts and organize them, I need to structure my project at the start and keep to a standard layout for all my projects.</p>
<p>So I&#8217;ve made a GitHub repository called <a href="https://github.com/DataMinerUK/skel" target="_blank">skel</a> (for skeleton) for all data driven journalism projects. Whenever I have a new project, I&#8217;ll make a zipped copy of the skel folder without git (see pdf instructions below), giving it the project name and build up my scripts and data in the various folders.</p>
<p><a href="https://github.com/DataMinerUK/skel" target="_blank">skel</a> contains the vagrant I talked about in my <a title="Prevent A CARcrash – All The Data Journalism Tools You Need In One Handy Box" href="http://datamineruk.com/2012/08/13/prevent-a-carcrash-all-the-data-journalism-tools-you-need-in-one-handy-box/">last post</a> so it has iPython, <a href="http://www.crummy.com/software/BeautifulSoup/" target="_blank">BeautifulSoup</a>, <a href="http://www.mysql.com/" target="_blank">MySQL</a> and <a href="http://www.elasticsearch.org/" target="_blank">Elasticsearch</a>. With every project iteration I will be adding to the vargrant (and possibly repackaging it) and refactoring my skel repository. So if you a journalist interested in data projects, get on <a href="https://github.com" target="_blank">GitHub</a>, clone the <a href="https://github.com/DataMinerUK/skel" target="_blank">repository</a> and watch it for any updates. Below is my very simple guide to using GitHub for non-developers:</p>
<p><a style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;" title="View How to Use GitHub on Scribd" href="http://www.scribd.com/doc/103048433/How-to-Use-GitHub">How to Use GitHub</a><iframe id="doc_28300" src="http://www.scribd.com/embeds/103048433/content?start_page=1&amp;view_mode=scroll&amp;access_key=key-1oxzgl4kasvyn0syq3di" frameborder="0" scrolling="no" width="100%" height="600" data-auto-height="true" data-aspect-ratio="0.772727272727273"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://datamineruk.com/2012/08/16/a-layout-for-all-your-data-driven-journalism-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Object Caching 656/658 objects using disk: basic

 Served from: datamineruk.com @ 2013-05-23 13:42:58 by W3 Total Cache -->