During my OpenNews Fellowship at The Guardian I am learning some very valuable lessons. Just as a trainee reporter learns the ropes, the beat and how to organise oneself for the steady flow of the news day so the budding data journalist needs to learn the same; only in terms of bits and bytes.
So what am I talking about? Frankly, I’m talking about the other day when I realised my Python setup tools were completely mangled. This is the danger of delving into the realm of code. You break the one thing you are reliant upon. This is what I refer to as a CARcrash. So how do you practice good driving?
First thing’s first: never deadline and drive. That is, never code frantically to a deadline. When you need to munge some data in a hurry it is very tempting to look for a new set of tools which promises to do it all in a few lines, install it and all its dependancies. Doing that leads to trouble; a computer that jumbles software versions and won’t install them correctly.
So how can you avoid a CARcrash? The answer is virtualization. That is, create a virtual machine in your machine that has all the software you need for the project. Run your code on the virtual machine. Export the data and kill the machine when your project is done. Sounds scary? Well it’s actually quite easy.
Download VirtualBox to create a virtual box. These are not easy to use so you want to get yourself a Vagrant. For the budding data journalist, I have a prepared vagrant for you (kindly made by a talented developer at The Guardian who swears by them), so all you need to do is download and install VirtualBox and Vagrant. Download this zip file, unzip it and move the contents into the top of the directory you have stored the files you want to work with (the vagrantfile has to be at the top of the tree as anything branching from the directory’s path will be made available to the vagrant on your virtual box).
This vagrant is an Ubuntu box with iPython, Elasticsearch, MySQL, and BeautifulSoup already installed. If you haven’t used any of those stay tuned and I should have some examples for you. Every time you want to use these, you put the contents of the zip folder at the top of the directory you are using, navigate into it in you terminal and use these simple commands:
> vagrant up # creates your vagrant for the first time, you only need to do this once > vagrant ssh # logs you on to your vagrant where you can run your files using MySQL, Elasticsearch, etc > ls # lists the files on your vagrant, this will include all the files in the directory you are working in > exit # logs you off your vagrant > vagrant destroy # once you have run your programmes and copied all the files created onto your directory you can destroy your vagrant and free up space
I will be using this virtual box for all of my tutorials from now on (and teaching but more of that to come) so stay tuned and follow along! The content of the box may change. I’ll let you know if this happens.