In a special guest post Anders Eriksen from the #bord4 editorial development and data journalism team at Norwegian news website Bergens Tidende talks about how they manage large data projects.
Do you really know how you ended up with those results after analyzing the data from Public Source?
Well, often we did not. This is what we knew:
- We had downloaded some data in Excel format.
- We did some magic cleaning of the data in Excel.
- We did some manual alterations of wrong or wrongly formatted data.
- We sorted, grouped, pivoted, and eureka! We had a story!
Then we got a new and updated batch of the same data. Or the editor wanted to check how we ended up with those numbers, that story.
…And so the problems start to appear.
How could we do the exact same analysis over and over again on different batches of data?
And how could we explain to curious readers and editors exactly how we ended up with those numbers or that graph?
We needed a way to structure our data analysis and make it traceable, reusable and documented. This post will show you how. We will not teach you how to code, but maybe inspire you to learn that in the process. Continue reading