3 new resources for data journalists

There have been a raft of new sites for data launched in the past couple of months which I haven’t had time to blog about, so here’s a quick round-up:

  • Tim DaviesOpen Data Cookbook aims to collect “step by step recipes for practical ways to use open data” – a useful complement to GetTheData. The recipes are currently aimed at the more technically minded but you know what to do to address that…
  • Is It Open Data? aims to “make it easy for people to make enquires of data holders, about the openness of the data they hold — and to record publicly the results of those efforts.”
  • And for those wishing to publish open data, The Open Data Manual provides information on what open data is, why you should publish open data, and how to do it. If you come up against an organisation that does not know how to publish their data in an open format, or needs convincing of why they should do so, this is a good place to point them to (or learn the arguments from).

If you’ve seen any other useful resources of late, please post a link in the comments.

Bed, knee and breakfast: designing for the iPad

Bed, knee and breakfast: the Bibliotype template

Craig Mod has written a lengthy and well-informed piece on A List Apart about the problems of designing for the iPad and other “browser”-based interfaces. He makes some particularly important points about the differences between products which have a spine as the “axis of symmetry” (e.g. books, magazines), and digital products where the axis is hard to place:

“If the axis of symmetry for a book is the spine, where is it on an iPad? On one hand, designers can approach tablets as if they were a single sheet of “paper,” letting the physicality of the object define the central axis of symmetry—straight down the middle.

“On the other hand, the physicality of these devices doesn’t represent the full potential of content space. The screen becomes a small portal to an infinite content plane, or “infinite canvas,” as so well illustrated by Scott McCloud.”

The core of his article is a design template for long form tablet reading, for which Mod breaks tablet reading distances into three main categories: Bed, Knee, and Breakfast

  • “Bed (Close to face): Reading a novel on your stomach, lying in bed with the iPad propped up on a pillow.
  • “Knee (Medium distance from face): Sitting on the couch or perhaps the Eurostar on your way to Paris, the iPad on your knee, catching up on Instapaper.
  • “Breakfast (Far from face): The iPad, propped up by the Apple case at a comfortable angle, behind your breakfast coffee and bagel, allowing for handsfree news reading as you wipe cream cheese from the corner of your mouth.”

An image of the template in action is shown above. It’s released under the MIT licence.

Although the article is written with ebooks in mind, the principles can obviously also be applied to magazine and news apps. Worth a read.

My inaugural lecture: Is Ice Cream Strawberry?

I’ll be presenting my inaugural lecture at City University on March 3 – the title is “Is ice cream strawberry? Journalism’s invisible history – and conflicted future”

For once the decision on what to speak on was entirely up to me, hence the cryptic title (the original title was ‘I’m Not Going To Talk About Technology’). It’s quite a wide-ranging talk, and I hope it turns out to be as stimulating to listen to as it has been to write.

Admission is free and it starts from 6pm – you can book your place here. I hope you can make it.

Sources fight back: fabrication, complaints, and the Daily Mail

Juliet Shaw writes in a guest post on No Sleep ‘Til Brooklands about her experience of fighting The Daily Mail through the courts after they published an apparently fabricated article (her dissection of the article and its fictions is both painstaking and painful).

There is no happy ending, but there are almost 100 comments. And once again you are struck by the power of sources to tell their side of the story. For Juliet Shaw you could just as well read Melanie Schregardus, or the Dunblane Facebook Group.

Among the comments is Mail reader Elaine, who says

“I have always taken their stance and opinions with a large doze of salt. It will be even larger now. Thank goodness for the internet – as a balance to the Mail I can access the Guardian and the Independent to see their take on a particular world/UK event.”

But also in the comments are others who say they have suffered from being the subject of fabricated articles in the Mail – first Catherine Hughes:

“The article was so damaging to my freelance career that editors I was working with now no longer answer my emails. ‘Heartbroken, devastated and gutted’ doesn’t even come close to how I feel. It happened in September and I am still distraught.”

Then Pomona:

“[I have] been a victim of the Daily Fail’s “journalism” on two occasions: once when my first marriage broke up and they printed a lurid and utterly innaccurate story about me (I’m no celeb, just Jo Public), and more recently when one of their journalists lifted and printed a Facebook reply to their request for information (leaving out the bit where I told them I did not permit them to use or reprint any part of my post)”

And Anonymous:

“The Daily Mail said they were looking for a real life example of a similar case of teachers exploiting trust to complement a news story. They promised to protect my anonymity, use only a very small picture and as one of a number of case studies. A week later a double page spread – taken up mostly with a picture of me – bore the headline ‘Dear Sir, I think I Love you’. The quotes bore no resemblance to what I said and made it sound like I liked the teacher?! Instead of what really happened – a drunken shuffle in the back of a car and a feeling of abuse of trust and sadness the next day.”

Jon Morgan:

“When the article was published, my role as welfare officer was never mentioned, the average overdraft had become *my* overdraft, and I was apparently on the verge of jacking in my studies in despair.”

Anonymous:

“I applied as a case study, the photoshoot, the invasive questions. Took months to get my expenses after dozens of ignored emails. Thankfully the article never went to print. At the time I was annoyed but now I am thankful. I also work in PR and would feel extremely uncomfortable offering anyone as a case study for a client. No matter how large the exposure.”

Dirtypj:

“I complained to the editor. He insisted that all journalists identify themselves as such every time. And that his employee had done no wrong. In short, he was calling ME a liar. And as all interviews are recorded he could prove it. I said, Okay, listen to the recording then! He replied, No, I don’t need to. I stand by my writers.”

Other comments mention similar experiences, some with other newspapers. It’s a small point, driven home over and over again: power has shifted.

Why journalists should be lobbying over police.uk’s crime data

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

  • No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.
  • Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.
  • No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!
  • No dates or times: funny how without dates and times I can’t tell which police manager was in charge.
  • Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

UPDATE: Will Perrin has FOI’d all correspondence relating to ICO advice on the crime maps. Jonathan Raper has a list of further flaws including:

  • Some data such as sexual offences and murder is removed – even though it would be easy to discover and locate from other police reports.
  • Data covers reported crimes rather than convictions, so some of it may turn out not to be crime.
  • The levels of policing are not provided, so that two areas with the “same” crime levels may in fact have “radically different” experiences of crime and policing.

Charles Arthur notes that: “Police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.”

Louise Kidney says:

“What we’ve actually got with http://www.police.uk is neither one nor the other. Ruth looks like a crime overlord cos of all the crimes happening in her garden and we haven’t got exact point data, but we haven’t got first part of postcode data either e.g. BB5 crimes or NW1 crimes. Instead, we’ve got this weird halfway house thing where it’s not accurate, but its inaccuracy almost renders it useless because we don’t have any idea if every force uses the same parameters when picking these points, we don’t know how they pick their points, we don’t know what we don’t know in terms of whether one house in particular is causing a considerable issue with anti-social behaviour for example, allowing me to go to my local Council and demand they do something about it.”

Adrian Short argues that “What we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.”

He adds:

“It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.

“£300K for this? There ought to be a law against it.”

UPDATE 2: One frustrated developer has launched CrimeSearch.co.uk to provide “helpful information about crime and policing in your area, without costing 300k of tax payers’ money”

Getting Started With Local Council Spending Data

With more and more councils doing as they were told and opening up their spending data in the name of transparency, it’s maybe worth a quick review of how the data is currently being made available.

To start with, I’m going to consider the Isle of Wight Council’s data, which was opened up earlier this week. The first data release can be found (though not easily?!) as a pair of Excel spreadsheets, both of which are just over 1 MB large, at http://www.iwight.com/council/transparency/ (This URL reminds me that it might be time to review my post on “Top Level” URL Conventions in Local Council Open Data Websites!)

The data has also been released via Spikes Cavell at Spotlight on Spend: Isle of Wight.

The Spotlight on Spend site offers a hierarchical table based view of the data; value add comes from the ability to compare spend with national averages and that of other councils. Links are also provided to monthly datasets available as a CSV download.

Uploading these datasets to Google Fusion tables shows the following columns are included in the CSV files available from Spotlight on Spend (click through the image to see the data):

Note that the Expense Area column appears to be empty, and “clumped” transaction dates use? Also note that each row, column and cell is commentable upon

The Excel spreadsheets on the Isle of Wight Council website are a little more complete – here’s the data in Google Fusion tables again (click through the image to see the data):

(It would maybe worth comparing these columns with those identified as Mandatory or Desirable in the Local Spending Data Guidance? A comparison with the format the esd use for their Linked Data cross-council local spending data demo might also be interesting?)

Note that because the Excel files on the Isle of Wight Council were larger than the 1MB size limit on XLS spreadsheet uploads to Google Fusion Tables, I had to open the spreadsheets in Excel and then export them as CSV documents. (Google Fusion Tables accepts CSV uploads for files up to 100MB.) So if you’re writing an open data sabotage manual, this maybe something worth bearing in mind (i.e. publish data in very large Excel spreadsheets)!

It’s also worth noting that if different councils use similar column headings and CSV file formats, and include a column stating the name of the council, it should be trivial to upload all their data to a common Google Fusion Table allowing comparisons to be made across councils, contractors with similar names to be identified across councils, and so on… (i.e. Google Fusion tables would probably let you do as much as Spotlight on Spend, though in a rather clunkier interface… but then again, I think there is a fusion table API…?;-)

Although the data hasn’t appeared there yet, I’m sure it won’t be long before it’s made available on OpenlyLocal:

However, the Isle of Wight’s hyperlocal news site, Ventnorblog teamed up with a local developer to revise Adrian Short’s Armchair Auditor code and released the OnTheWIght Armchair Auditor site:

So that’s a round up of where the data is, and how it’s presented. If I get a chance, the next step is to:
– compare the offerings with each other in more detail, e.g. the columns each view provides;
– compare the offerings with the guidance on release of council spending data;
– see what interesting Google Fusion table views we can come up with as “top level” reports on the Isle of Wight data;
– explore the extent to which Google Fusion Tables can be used to aggregate and compare data from across different councils.

PS related – Nodalities blog: Linked Spending Data – How and Why Bother Pt2

PPS for a list of local councils and the data they have released, see Guardian datastore: Local council spending over £500, OpenlyLocal Council Spending Dashboard

While you’re waiting for Yahoo! to make its mind up about Delicious, sign up to Trunk.ly

Despite the incredible work done on the spreadsheet comparing social bookmarking services I am yet to find one that does everything that I use Delicious for (background here). One service I have been using, however, is Trunk.ly.

Once you’ve imported your existing bookmarks from Delicious Trunk.ly stores any new ones you bookmark on Delicious, keeping the backup up to date. In addition it can store any links you’ve shared on Twitter, Facebook, Google Reader and any RSS feed.

It is essentially a search engine for links you may have shared at some point – but its technical limitations stop it from being much more. For example, there do not appear to be any RSS feeds for tags*, and there is no facility to combine tags to find items that are, for example, tagged with ‘privacy’ and ‘tools’. (It would also be nice if it tagged links shared on Twitter with any hashtags in the tweet)

That said if, like me, you want to continue using Delicious but with an ongoing backup in case, Trunk.ly appears a sound choice. And it’s early days, so here’s hoping they add those features soon… *cough*.

*Planned apparently. See Trunk.ly in the comments below.

Investigations tool DocumentCloud goes public (PS: documents drive traffic)

The rather lovely DocumentCloud – a tool that allows journalists to share, annotate, connect and organise documents – has finally emerged from its closet and made itself available to public searches.

This means that anyone can now search the powerful database (some tips here) of newsworthy documents. If you want to add your own, however, you still need approval.

If you do end up on this list you’ll find it’s quite a powerful tool, with quick conversion of PDFs into text files, analytic tools and semantic tagging (so you can connect all documents with a particular person, or organisation) among its best features. The site is open source and has an API too.

I asked Program Director Amanda B Hickman what she’s learned on the project so far. Her response suggests that documents have a particular appeal for online readers:

“If we’ve learned anything, it is that people really love documents. It is pretty clear that when there’s something interesting going on in the news, plenty of people want to dig a little deeper. When Arizona Republic posted an annotated version of that state’s new immigration law, it got more traffic than their weekly entertainment round up. WNYC told us that the page listing the indictments in last week’s mob roundup was still getting more traffic than any other single news story even a week later.

“These were big news documents, to be sure, but it still seems pretty clear that people do want to dig deeper and explore the documents behind the news, which is great for us and great for news.”

Content, context and code: verifying information online

ContentContextCode_VerifyingInfo

When the telephone first entered the newsroom journalists were sceptical. “How can we be sure that the person at the other end is who they say they are?” The question seems odd now, because we have become so used to phone technology that we barely think of it as technology at all – and there are a range of techniques we use, almost unconsciously, to verify what the person on the other end of the phone is saying, from their tone of voice, to the number they are ringing from, and the information they are providing.

Dealing with online sources is no different. How do you know the source is telling the truth? You’re a journalist, for god’s sake: it’s your job to find out.

In many ways the internet gives us extra tools to verify information – certainly more than the phone ever did. The apparent ‘facelessness’ of the medium is misleading: every piece of information, and every person, leaves a trail of data that you can use to build a picture of its reliability.

The following is a three-level approach to verification: starting with the content itself, moving on to the context surrounding it; and finishing with the technical information underlying it. Most of the techniques outlined take very little time at all but the key thing is to look for warning signs and follow those up. Continue reading