Tag Archives: computer aided reporting

Using Google Spreadsheets as a database (no, it really is very interesting, honest)

This post by Tony Hirst should be recommended reading for every journalist interested in the potential of computers for reporting.

Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users.

Put aside any intimidation you might feel at the mention of APIs and query languages. What it boils down to is this: you can alter the web address of a Google spreadsheet to filter the data and find the story.

Simple as that.

Hirst uses the example of the spreadsheet of MPs expenses recently released by The Guardian (they’ve also published Lords expenses). By altering the URLs this is what he generates (I’m quoting his bullet points):

the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
Find the average amount claimed by the MPs: select sum(I)/count(I)
Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
For each party, find out how much (on average) each party member claims:select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D”>select D,sum(I)/count(I) where I>=0 group by D

OK, you need to know the words to use (and if you have a link to an easy reference for these let me know*), but this is still a lot easier than using programming languages and databases.

As I say, this also illustrates the importance of publishing raw data so users can interrogate it in their own ways, which is precisely what The Guardian’s Data Store has been doing, meaning that people like Tony can create interfaces like this.

Wonderful.

*Tony has very generously created this page which helps you formulate your search – and generates the URL. If you were working on a different spreadsheet you could just replace the spreadsheet URL and change any column references accordingly.

UPDATE: Tony also has a version which allows you to pick from Guardian datasets.

I’m launching an MA in Online Journalism

63 Replies

From September I will be running an MA in Online Journalism at Birmingham City University. I hope it’s going to be different from any other journalism MA.

That’s because in putting it together I’ve had the luxury of a largely blank canvas, which means I’ve not had to work within the strictures and structures of linear production based courses.

The first words I put down on that blank piece of paper were: Enterprise; experimentation; community; creativity.

And then I fleshed it out:

In the Online Journalism MA’s first stage (Certificate) students will study Journalism Enterprise. This will look at business models for online journalism, from freemium to mobile, public funding to ad networks, alongside legal and ethical considerations. I’m thinking at the moment that each student will have to research a different area and present a business case for a startup.

They will also study Newsgathering, Production and Distribution. I’m not teaching them separately because, online, they are often one and the same thing. And as students should already have basic skills in these areas, I will be focusing on building and reinventing those as they run a live news website (I’ll also be involved in an MA in Social Media, so there should be some interesting overlap).

The second stage of the MA Online Journalism (Diploma) includes the module I’m most excited about: Experimentation – aka Online Journalism Labs.

This is an explicit space for students to try new things, fail well, and learn what works. They will do this in partnership with a news organisation based on a problem they both identify (e.g. not making enough revenue; poor community; etc.) – I’ve already lined up partnerships with national and regional newspapers, broadcasters and startups in the UK and internationally: effectively the student acts as a consultant, with the class as a whole sharing knowledge and experience.

Alongside that they will continue to explore more newsgathering, production and distribution, exploring areas such as computer assisted reporting, user generated content, multimedia and interactivity. They may, for example, conduct an investigation that produces particularly deep, engaging and distributed content and conversation.

The final stage is MA by Project – either individually or as a group, students make a business case for a startup or offshoot, research it, build it, run it and bid for funding.

By the time they leave the course, graduates should not be going into the industry at entry level (after all, who is recruiting these days?), but at a more senior, strategic level – or, equally likely, to establish startups themselves. I’m hoping these are the people who are going to save journalism.

At the moment all these plans are in draft form. I am hoping this will be a course without walls, responding to ideas from industry and evolving as a result. Which is why I’m asking for your input now: what would you like to see included in an MA Online Journalism? The BJTC’s Steve Harris has mentioned voice training, media law and ethics. The BBC’s Peter Horrocks has suggested programming and design skills. You may agree or disagree.

Let’s get a conversation going.

The services of the ‘semantic web’

11 Replies

Many of the services that are being developed as part of the ‘semantic web’ are necessarily works in progress, but they all contribute to extending the success of this burgeoning area of technology. There are plenty more popping up all the time, but for the purposes of this post I have loosely grouped some prominent sites into specialities – social networking, search and browsing – before briefly explaining their uses.

Continue reading →

The next step to the ‘semantic web’

9 Replies

There are billions of pages of unsorted and unclassified information online, which make up millions of terabytes of data with almost no organisation. It is not necessarily true that some of this information is valuable whilst some is worthless, that’s just a judgement for who desires it. At the moment, the most common way to access any information is through the hegemonic search engines which act as an entry point.

Yet, despite Google’s dominace of the market and culture, the methodology of search still isn’t satisfactory. Leading technologists see the next stage of development coming, where computers will become capable of effectively analysing and understanding data rather than just presenting it to us. Search engine optimisation will eventually be replaced by the ‘semantic web’.

Continue reading →

Sport and data – now it’s more than just ‘interactive’

2 Replies

I’ve written previously on the Online Journalism Blog about ‘Why fantasy football may hold the key to the future of news‘. Now it seems The Guardian has taken things up a notch with the wonderful Chalkboard feature: an interactive database-driven toolkit that allows you to create your own ‘chalkboards’ illustrating whatever point you may wish to make about a team or player’s performance. Here’s my first attempt below:

Cute, yes? But more than just cute. This is an idea that takes sports data and makes it more than just ‘interactive’. This makes it communicative.

Because you are not just toying with data but creating it to make a point. Once you create a chalkboard it is published to everyone, with space for comments. You can send it, share it or embed it – as I have.

Clearly there are improvements that can be made – starting with searchability/findability from the chalkboard/team page and the odd bug (the description which I entered was not visible on the test I did above, and limiting it to the final 15 minutes does not seem to have worked – you still see all passes).

But really that would be picking holes in what is a beautifully thought-through piece of work – a piece of work that understands if you’re to make news work online it has to be as much a platform as a destination (a platform which in turn opens up plenty of opportunities for monetisation).

The site claims match stats will be available 15 minutes after the full time whistle. Suddenly the calls to local radio to bemoan the manager’s tactics seem one-dimensional. And spending 60 seconds reading the match report is nothing compared to the time that will be spent carefully constructing your argument as to why your star midfielder should not have been sold to that close relegation rival…

Thanks to Alex Lockwood for the tip-off.

The future of investigative journalism: databases and algorithms

8 Replies

There’s a great article over at Miller-McCune on investigative journalism and what you might variously call computer assisted reporting and database journalism. Worth reading in full, the really interesting stuff comes further in, which I’ve quoted below in full:

“Bill Allison, a senior fellow at the Sunlight Foundation and a veteran investigative reporter and editor, summarizes the nonprofit’s aim as “one-click” government transparency, to be achieved by funding online technology that does some of what investigative reporters always have done: gather records and cross-check them against one another, in hopes of finding signs or patterns of problems

“… Before he came to the Sunlight Foundation, Allison says, the notion that computer algorithms could do a significant part of what investigative reporters have always done seemed “far-fetched.” But there’s nothing far-fetched about the use of data-mining techniques in the pursuit of patterns. Law firms already use data “chewers” to parse the thousands of pages of information they get in the discovery phase of legal actions, Allison notes, looking for key phrases and terms and sorting the probative wheat from the chaff and, in the process, “learning” to be smarter in their further searches.

“Now, in the post-Google Age, Allison sees the possibility that computer algorithms can sort through the huge amounts of databased information available on the Internet, providing public interest reporters with sets of potential story leads they otherwise might never have found. The programs could only enhance, not replace, the reporter, who would still have to cultivate the human sources and provide the context and verification needed for quality journalism. But the data-mining programs could make the reporters more efficient — and, perhaps, a less appealing target for media company bean counters looking for someone to lay off. “I think that this is much more a tool to inform reporters,” Allison says, “so they can do their jobs better.”

“… After he fills the endowed chair for the Knight Professor of the Practice of Journalism and Public Policy Studies, [James] Hamilton hopes the new professor can help him grow an academic field that provides generations of new tools for the investigative journalist and public interest-minded citizen. The investigative algorithms could be based in part on a sort of reverse engineering, taking advantage of experience with previous investigative stories and corruption cases and looking for combinations of data that have, in the past, been connected to politicians or institutions that were incompetent or venal. “The whole idea is that we would be doing research and development in a scalable, open-source way,” he says. “We would try to promote tools that journalists and others could use.”

Hat tip to Nick Booth

Model for the 21st century newsroom pt.6: new journalists for new information flows

33 Replies

new journalists for new information

Information is changing. The news industry was born in a time of information scarcity – and any understanding of the laws of supply and demand will tell you that that made information valuable.

But the past 30 years have seen that the erosion of that scarcity. Not only have the barriers to publishing, broadcast and distribution been lowered by desktop publishing, satellite and digital technologies, and the web – but a booming PR industry has grown up to provide these news organisations with ‘cheap’ news.

Information is changing. Increasingly, we are not seeking information out – instead, it finds us. The scarcity is not in information, but in our time to wade through it, make meaning of it, and act on it.

Information is changing, and so journalists must too. In the previous parts of this series I’ve looked at how the news process could change in a multiplatform environment; how to involve the former audience; what can now happen after a story is published; journalists and readers as distributors; and new media business models. In this part I want to look at personnel – and how we might move from a generic, hierarchy of ‘reporters’, ‘subs’ and ‘editors’ to a more horizontal structure of roles based on information types. Continue reading →

More about that social-media-for-news training next week

What are your most useful online tools? (Something for the Weekend #12)

15 Replies

I’ve looked at a number of tools in this series, often very new with potential applications for journalism that haven’t been realised. This time I want to turn the spotlight onto tools that you’re using every day, which may not be flashy, but which do a simple job very well – for example:

in managing or filtering information,
identifying leads, ideas and contacts,
producing news itself,
distributing it,
or allowing users to get involved.

What have been the most useful online tools you’ve used?

RSS readers: why have just one?

10 Replies

Recently my long love affair with Bloglines has been hitting the rocks. I’ve been seeing another RSS reader. Yes, it’s Google Reader.

It started on the bus to work. You see, the mobile version of Bloglines doesn’t do it for me. My ‘morning paper’, now, is to scroll through the headlines from the dozens of blogs I subscribe to – in Google Reader mobile. If it’s something I might want to return to later, I ‘star’ it. If the blog post supports it, I might even bookmark it on del.icio.us. Continue reading →