Monthly Archives: July 2012

My first ebook: Scraping For Journalists (and programming too)

Next week I will start publishing my first ebook: Scraping for Journalists.

Although I’ve written about scraping before on the blog, this book is designed to take the reader step by step through a series of tasks (a chapter each) which build a gradual understanding of the principles and techniques for tackling scraping problems. Everything has a direct application for journalism, and each principle is related to their application in scraping for newsgathering.

For example: the first scraper requires no programming knowledge, and is working within 5 minutes of reading.

I’m using Leanpub for this ebook, because it allows you to publish in installments and update the book for users – which suits a book like this perfectly, as I’ll be publishing chapters week by week, Codecademy-style.

If you want to be alerted when the book is ready register on the book’s Leanpub page.

Interest Differencing: Folk Commonly Followed by Tweeting MPs of Different Parties

Earlier this year I doodled a recipe for comparing the folk commonly followed by users of a couple of BBC programme hashtags (Social Media Interest Maps of Newsnight and BBCQT Twitterers). Prompted in part by a tweet from Michael Smethurst/@fantasticlife about generating an ESP map for UK politicians (something I’ve also doodled before – Sketching the Structure of the UK Political Media Twittersphere) I drew on the @tweetminster Twitter lists of MPs by party to generate lists of folk commonly followed by the MPs of each party.

Using the R wordcloud library commonality and comparison clouds, we can get a visual impression of folk commonly followed in significant numbers by all the MPs of the three main parties, as well as the folk the MPs of each party follow significantly and differentially to the other parties:

There’s still a fair bit to do making the methodology robust (for example, being able to cope with comparing folk commonly followed by different sets of users where the size of the set differs to a significant extent (for example, there is a large difference between the number of tweeting Conservative and LibDem MPs). I’ve also noticed that repeatedly running the comparison.cloud code turns up different clouds, so there’s some element of randomness in there. I guess this just adds to the “sketchy” nature of the visualisation; or maybe hints at a technique akin to the way a photogrpaher will take multiple shots of a subject before picking one or two to illustrate something in particular. Which is to say: the “truthiness” of the image reflects the message that you are trying to communicate. The visualisation in this case exposes a partial truth (which is to say, no absolute truth), or particular perspective about the way different groups differentially follow folk on Twitter. A couple of other quirks I’ve noticed about the comparison.cloud as currently defined: firstly, very highly represented friends are sized too large to appear in the cloud (which is why very commonly followed folk across all sets – the people that appear in the commonality cloud – tend not to appear) – there must be a better way of handling this? Secondly, if one person is represented so highly in one group that they don’t appear in the cloud for that group, they may appear elsewhere in the cloud. (So for example, I tried plotting clouds for folk commonly followed by a sample of the followers of @davegorman, as well as the people commonly followed by the friends of @davegorman – and @davegorman appeared as a small label in the friends part of the comparison.cloud (notwithstanding the fact that all the followers of @davegorman follow @davegorman, but not all his friends do… What might make more sense would be to suppress the display of a label in the colour of a particular group if that label has a higher representation in any of the other groups (and isn’t displayed because it would be too large)).

That said, as a quick sketch, I think there’s some information being revealed there (the coloured comparison.cloud seems to pull out some names that make sense as commonly followed folk peculiar to each party…). I guess way forward is to start picking apart the comparison.cloud code, another is to explore a few more comparison sets? Suggestions welcome as to what they might be…:-)

PS by the by, I notice via the Guardian datablog (Church vs beer: using Twitter to map regional differences in US culture) another Twitter based comparison project – Church or Beer? Americans on Twitter – which looked at geo-coded Tweets over a particular time period on a US state-wide basis and counted the relative occurrence of Tweets mentioning “church” or “beer”…

Let’s explode the myth that data journalism is ‘resource intensive’

"Data Journalism is very time consuming, needs experts, is hard to do with shrinking news rooms" Eva Linsinger, Profil

Is data journalism ‘time consuming’ or ‘resource intensive’? The excuse – and I think it is an excuse – seems to come up at an increasing number of events whenever data journalism is discussed. “It’s OK for the New York Times/Guardian/BBC,” goes the argument. “But how can our small team justify the resources – especially in a time of cutbacks?

The idea that data journalism inherently requires extra resources is flawed – but understandable. Spectacular interactives, large scale datasets and investigative projects are the headliners of data journalism’s recent history. We have oohed and aahed over what has been achieved by programmer-journalists and data sleuths…

But that’s not all there is.

Continue reading

Hyperlocal Voices: Ed Walker and Ryan Gibson, Blog Preston

For the third in our new series of Hyperlocal Voices we head North to the city of Preston in Lancashire, UK. Damian Radcliffe spoke to Blog Preston‘s Ed Walker and Ryan Gibson about some of the lessons they have learned over the last three and a half years.

1. Who were the people behind the blog?

Ed: There’s me, Ed, who used to live in Preston but now lives in London – studied and lived in Preston for five years. Plus Ryan Owen Gibson who is Preston born and bred, he’s co-editor. James Duffell a local web developer and designer is the technical brains behind the site. We’ve recently said goodbye to co-editor Joseph Stashko who was studying at the University of Central Lancashire but will be departing Preston soon after joining Blog Preston in April 2010. We also had co-editor Andy Halls on board from April 2010 to May 2011 before he joined The Sun. We also have some excellent guest contributors including Holly Sutton, Paul Swarbrick, Lisa McManus Paul Melling and many others!

2. What made you decide to set up the blog?

It was a cold January afternoon in 2009, the Preston Citizen (weekly free newspaper for the city) had recently shut down and there was a chance to create something new.

3. When did you set up the blog and how did you go about it?

Ed: Sunday 11th January 2009, started out as a wordpress.com blog to test the water and after a couple of months I recruited the help of James Duffell and he made an ace site and helped me move it to a proper domain. Just started posting local news and events, and build it up from there – lots of Freedom of Information requests, local photos, events coverage and nostalgia.

4. What other blogs, bloggers or websites influenced you?

Ed: I saw the St Albans Blog, and thought, hey, this could happen here.

5. How did – and do – you see yourself in relation to a traditional news operation?

Ryan: I don’t think Blog Preston can compete with a traditional news operation, and I don’t think we would want to. What makes a hyperlocal blog such as ours so great is that we have the freedom, both editorially and strategically, to change our course very quickly. This means we that can adapt to our readership much faster than a traditional news operation can. I also like to think we listen to our readers more, and we try to engage with them through social media channels and on the blog itself.

6. What have been the key moments in the blog’s development editorially?

Ed: May 2010 – we covered the general election and we’ll touch on why that was so important. July 2009 was a big moment, we moved to a hosted solution with a proper domain and really started to accelerate the amount of content going on the site. 2011 was big as we teamed up with NESTA to train community reporters and we recruited a lot of guest contributors, plus Ryan came onboard and has really excelled at live event coverage.

7. What sort of traffic do you get and how has that changed over time?

Ed: We now average around 10,000 unique visitors a month, with 24,000 page impressions. In October 2010 the site was averaging 10,000 page impressions a month and 4,000 unique visitors.

8. What is / has been your biggest challenge to date?

Ed: Just keeping the momentum going, it’s easy to set a site up but when you move away from an area it’s a tough decision, do you shut the site or down to try to keep it going? Fortuntely there’s a great team of people who have stuck their hand up and got involved, and well, we’re still producing great community news for Preston.

9. What story, feature or series are you most proud of?

Ryan: Blog Preston has been lucky enough to break a number of stories that weren’t being picked up by the mainstream media at the time, such as an announcement that the BBC would be coming to Preston to film a series of short dramas, dubbed the Preston Passion, as part of its Easter output.

…I think the live coverage of the May 2010 electionsreally defines what we are about. The mechanics of that series was very simple – it was just a team of guys with a laptop and a mobile phone each, but the level of coverage they managed to achieve went above and beyond what any of the other news operations were doing at the time.

We were the first to interview Preston MP Mark Hendrick after his re-election.

Perhaps this was the moment that people began to take us seriously.

10. What are your plans for the future?

Ryan: 2012 is very important for Preston due to its unique significance as a Guild year, which is only celebrated once every twenty years. So editorially, we are being kept busy covering local events and breaking new stories.

We are also working closely with a number of organisations to collaborate and increase our readership through joint ventures. We are in talks with lots of important people, which is exciting. Our main aim going forward is to grow the editorial team, to put us in a position where we can call on some of the best local writers and reporters to deliver the best content for Blog Preston readers.