Tag Archives: book

Data journalism ebook now on Amazon’s Kindle Store

Data journalism book Data Journalism Heist

My short ebook Data Journalism Heist is now available on Amazon for Kindle (US link here - also available on other countries’ Amazon sites).

The book is an introduction to data journalism and two simple techniques in particular: finding story leads using pivot tables and advanced filters.

The book also covers useful sources of data, how to follow leads up, and how to tell the resulting story.

You can also buy it from Leanpub, where it’s been live for a couple months now and is available in PDF, mobi and ePub formats. Comments welcome as always.

New ebook now ready! Learn basic spreadsheet skills with Data Journalism Heist

Data journalism book Data Journalism Heist

I’ve written a short ebook for people who are looking to get started with data journalism but need some help.

Data Journalism Heist covers two simple techniques for finding story leads in spreadsheets: pivot tables and advanced filters.

Neither technique requires any formulae, and there are dozens of local datasets (and one international one) to use them on.

In addition the book covers how to follow leads from data, and tell the resulting story, with tips on visualisation and plenty of recommendations for next steps.

You can buy it from Leanpub here. Comments welcome as always.

Web security for journalists – takeaway tips and review

Web security for journalists - book cover

Early in Alan Pearce‘s book on web security, Deep Web for Journalists, a series of statistics appears that tell a striking story about the spread of surveillance in just one country.

199 is the first: the number of data mining programs in the US in 2004 when 16 Federal agencies were “on the look-out for suspicious activity”.

Just six years later there were 1,200 government agencies working on domestic intelligence programs, and 1,900 private companies working on domestic intelligence programs in the same year.

As a result of this spread there are, notes Pearce, 4.8m people with security clearance “that allows them to access all kinds of personal information”. 1.4m have Top Secret clearance.

But the most sobering figure comes at the end: 1,600 - the number of names added to the FBI’s terrorism watchlist each day.

Predictive policing

This is the world of predictive policing that a modern journalist must operate in: where browsing protesters’ websites, making particular searches, or mentioning certain keywords in your emails or tweets can put you on a watchlist, or even a no-fly list. An environment where it is increasingly difficult to protect your sources – or indeed for sources to trust you.

Alan Pearce’s book attempts to map this world – and outline the myriad techniques to avoid compromising your sources. Continue reading

Scraping using regular expressions in OutWit Hub – part 2: special characters, negative matches and more

Regular Expressions slogan t-shirt

Image by Lasse Havelund

In the second part of this extract from Chapter 10 of Scraping for Journalists I recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.

 

Continue reading

How-to: Scraping ugly HTML using ‘regular expressions’ in an OutWit Hub scraper

Regular Expressions cartoon on xkcd

Regular Expressions cartoon from xkcd

The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful. 

This tutorial will show you how to scrape a particularly badly formatted piece of data. In this case, the UK Labour Party’s publication of meetings and dinners with donors and trade union general secretaries.

To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading

Magazine Editing – 3rd edition now out (disclosure: I edited it)

Magazine Editing 3rd edition

UPDATE: Readers of this blog can now get a 20% discount off the book by using the code ME1211 when ordering on the Routledge site.

Magazine Editing is one of those books that I’ve used for years in my teaching. Unlike most books in the field, it has a healthy focus on the less glamorous aspects of running magazines, such as managing teams and budgets, editorial strategy, and the significant proportion of the industry – B2B, contract publishing, controlled-circulation, subscription-based – that you don’t see on supermarket shelves.

For the third edition, publishers Routledge approached me to update the book for a multiplatform age. That work is now done – and the new edition is now out.

Although it now has my name on it, the book remains primarily the work of John Morrish, who wrote the first two editions of the book. Editing his work gave me a fresh appreciation of just what a timeless job he has done in identifying the skills needed by magazine editors – as I write in the introduction:

“It is striking how much of the advice in the book is more important than ever. In a period of enormous change it is key to focus on the core skills of magazine editing: clear leadership, effective management, people skills and creative thinking around what exactly it is that your readers are buying into – whether that’s printed on paper, pixels on a screen, or something intangible like a sense of community and belonging.”

So if you can find one of the older editions cheap, you’ll still find it useful.

So what did I add to the new edition of Magazine Editing? It goes without saying that digital magazines (web-only, apps) are now covered. The diversification of revenue models – the increased importance of events, merchandising, data, mobile and apps – is now explored, as well as how online advertising works, and how it differs from traditional advertising. How to use online resources, including web analytics, to better understand your audience and inform your editorial strategy; and how magazine campaigns are changed by the dynamics of the web.

The chapter on leading and managing now includes sections on managing information overload, social bookmarking and social media policies, and there’s a new section on legal guidance on placements and internships. The budgeting sections now include online considerations, and there’s an exploration of the pros and cons of using free or minimal cost third party services against building tools in-house. A passage from the section on ‘Making money online’ is illustrative of the shifts facing the industry:

“Like so much else on the web, it is becoming difficult to see where content ends and commerce begins. The concept of a ‘magazine’ blurs when, online, it can also be a shop, a game, or a tool. It helps to think of how the business model of magazines has traditionally worked: gathering a community of people in the same place (on your pages) where companies can then advertise their products and services. The same principle applies now, but the barriers to selling products and services yourself have been significantly lowered, just as the barriers to publishing content have been significantly lowered for those companies whose advertising used to fund print publishing. Integrity is no less important in this context: users will desert your website if your content is only concerned with selling them your products, just as they will desert if your events are badly organised, your merchandise poor quality, or your service shoddy. Publishers increasingly talk of a ‘brand experience’ of which the content is just one part. In many ways this makes the reader – as they also become a consumer – more powerful, and the advertiser less so. Your insights into what they are talking and reading about may be of increasing interest to those who are searching for new revenue streams.”

The chapter on writing covers considerations in evaluating online sources of information and the debates in online journalism around objectivity versus transparency, and the values of a ‘web-first’ strategy. I also cover online tools for organising diaries and monitoring social media. There’s an exploration of best practice guidelines in writing for the web, and when multimedia is appropriate or preferable.

The chapter on pictures and design now includes advice on dealing with web designers and developers, multiplatform design and branding, sourcing video for the web, copyright and Creative Commons, infographics, and image considerations for online publication. And ‘Managing Production’ covers search engine optimisation, scheduling online production, and online distribution. The penultimate chapter on legal considerations adds data protection, the role of archives in contempt of court, and website terms and conditions.

I end the book with a list of tools that allows the reader to get publishing right now. And aside from the legal developments, the new considerations, roles and stages in the production cycle, this is perhaps the most important change from previous editions: a student reading this book is no longer waiting for their first job in publishing: they should be creating it.

If you have read the book and want to receive updates on developments in the magazine industry, please Like the book’s Facebook page. I’d also welcome any comments on areas you think are well covered – or need to be covered further.

Review: Heather Brooke – The Silent State

The Silent State

In the week that a general election is called, Heather Brooke’s latest book couldn’t have been better timed. The Silent State is a staggeringly ambitious piece of work that pierces through the fog of the UK’s bureaucracies of power to show how they work, what is being hidden, and the inconsistencies underlying the way public money is spent.

Like her previous book, Your Right To Know, Brooke structures the book into chapters looking at different parts of the power system in the UK – making it a particularly usable reference work when you want to get your head around a particular aspect of our political systems.

Chapter by chapter

Chapter 1 lists the various databases that have been created to maintain information on citizens - paying particular focus to the little-publicised rack of databases holding subjective data on children. The story of how an old unpopular policy was rebranded to ride into existence on the back of the Victoria Climbie bandwagon is particularly illustrative of government’s hunger for data for data’s sake.

Picking up that thread further, Chapter 2 explores how much public money is spent on PR and how public servants are increasingly prevented from speaking directly to the media. It’s this trend which made The Times’ outing of police blogger Nightjack particularly loathsome and why we need to ensure we fight hard to protect those who provide an insight into their work on the ground.

Chapter 3 looks at how the misuse of statistics led to the independence of the head of the Office of National Statistics – but not the staff that he manages – and how the statistics given to the media can differ quite significantly to those provided when requested by a Select Committee (the lesson being that these can be useful sources to check). It’s a key chapter for anyone interested in the future of public data and data journalism.

Bureaucracy itself is the subject of the fourth chapter. Most of this is a plea for good bureaucracy and the end of unnamed sources, but there is still space for illustrative and useful anecdotes about acquiring information from the Ministry of Defence.

And in Chapter 5 we get a potted history of MySociety’s struggle to make politicians accountable for their votes, and an overview of how data gathered with public money – from The Royal Mail’s postcodes to Ordnance Survey – is sold back to the public at a monopolistic premium.

The justice system and the police are scrutinised in the 6th and 7th chapters – from the twisted logic that decreed audio recordings are more unreliable than written records to the criminalisation of complaint.

Then finally we end with a personal story in Chapter 8: a reflection on the MPs’ expenses saga that Brooke is best known for. You can understand the publishers – and indeed, many readers – wanting to read the story first-hand, but it’s also the least informative of all the chapters for journalists (which is a credit to all that Brooke has achieved on that front in wider society).

With a final ‘manifesto’ section Brooke summarises the main demands running across the book and leaves you ready to storm every institution in this country demanding change. It’s an experience reminiscent of finishing Franz Kafka’s The Trial – we have just been taken on a tour through the faceless, logic-deprived halls of power. And it’s a disconcerting, disorientating feeling.

Journalism 2.0

But this is not fiction. It is great journalism. And the victims caught in expensive paper trails and logical dead ends are real people.

Because although the book is designed to be dipped in as a reference work, it is also written as an eminently readable page-turner – indeed, the page-turning gets faster as the reader gets angrier. Throughout, Brooke illustrates her findings with anecdotes that not only put a human face on the victims of bureaucracy, but also pass on the valuable experience of those who have managed to get results.

For that reason, the book is not a pessimistic or sensationalist piece of writing. There is hope – and the likes of Brooke, and MySociety, and others in this book are testament to the fact that this can be changed.

The Silent State is journalism 2.0 at its best – not just exposing injustice and waste, but providing a platform for others to hold power to account. It’s not content for content’s sake, but a tool. I strongly recommend not just buying it – but using it. Because there’s some serious work to be done.