Tag Archives: search

When you get data in sentences: how to use a spreadsheet to extract numbers from phrases

Unduly lenient sentences review scheme inadequate

This BBC story involved converting phrases into numbers that could be used in calculations

Earlier this month the BBC Data Unit published a story on unduly lenient sentences which involved working with data that was trapped in phrases.

We needed to be able to take a collection of words such as “11 years and 5 months’ imprisonment” and convert that into something that could be used in spreadsheet calculations (specifically, comparing the lengths of time represented by two different phrases).

It’s a problem you come across every so often as a journalist — especially with FOI requests — so in this post — taken from the book Finding Stories in Spreadsheets — I’ll explain how to do that. Continue reading

Advertisements

An online journalism reading list

It’s the start of a new academic year so I thought I’d compile a list of the latest reading I would recommend for any students looking at online journalism. (If you have suggestions for additions please let me know!):

Theoretical, historical and conceptual background

  • Digital Journalism by Jones & Lee (Sage, 2011) is very comprehensive and worth reading in full.
  • Gatewatching by Axel Bruns (Peter Lang, 2005) covers areas that tend to be overlooked by journalism books, such as new media methods and startups from outside traditional media. Read: Chapter 4: Making News Open Source
  • The Wealth of Networks by Yochai Benkler (Yale University Press, 2007) provides a wider context and is available free online. Read: Chapter 4: The Economics of Social Production.
  • We The Media by Dan Gillmor (O’Reilly, 2006) is a seminal book on citizen journalism which is also available free online.

Practical online journalism – general

  • Clearly I’m going to say my own book, the Online Journalism Handbook (2017, Routledge), [UPDATE: now in its second edition], which covers blogging and web writing, data journalism, online audio and video, interactivity, community management and law. Continue reading

Search and filter tweets using Friendfeed advanced search

I’ve never been fond of the search engine on Twitter, not the one on search.twitter.com anyway. I have found the ones build on it’s API much friendlier and more intuitive, such as Twitterfall and the integration in Tweetdeck. But none of them work for finding old tweets. Google is not much help either, unless you know how to create your own search engine.

Friendfeed aggregates and stores all the activity that is fed into the system. Most FF users bring in their Twitter feed, in effect storing all their tweets. It works a little bit like Google Reader, once it’s there, it will always be there, even if the original is deleted.

The advanced search features of Friendfeed makes it a pretty good twitter search alternative. It even supports real-time, so you can make your own twitter news monitors.

Searching old tweets

Twitter only keeps tweets in it’s search database for a few weeks, after that they disappear. They’re still available on the web, just not searchable from Twitter (or any thrid party app). That’s great if you just want the real-time view, but not practical when looking for an exact tweet a few weeks old.

I needed to find this tweet from Paul Bradshaw for a presentation, but it was long gone from the internal database. I knew that Paul is using Friendfeed, not actively but he’s sharing his tweets there, so I did this search (Bingo, no. 2 from the top). Here’s the equivalent twitter search which is no help.

From any Friendfeed page, you simple select advanced search at the top, fill in the blanks and you got it. Here’s how mine was filled in.

Real-time “noise” filtering

Some hashtags can get ugly, real quick. There’s no way to filter out the high quality tweets either. People can favorite tweets, but you can’t search them, so no way to filter. When news breaks, there will be a few quality tweets in the beginning, people will retweet the most important. But people quickly starts talking about the event which brings no real value to the table, other than twitter-chatter. Eyewitness accounts and other useful information is lost in the stream because people have no way of marking important tweets for later retrieval (search).

On Friendfeed, people have the option of liking entries, and the advanced search let’s you filter items based on likes or comments. You can now rely on the FF community to mark the important stuff and cut through all (some of) the noise.

Friendfeed advanced search

Here’s an example of a search that filters all tweets with the #iranelection hashtag, and shows only tweets that has 3 or more likes.

Other uses

There are many other ways to search Friendfeed and you can filter for all services like facebook, blogs etc. You can save searches and use them as filters. I have several live searches saved in Friendfeed. Here’s an example of a search that gives me all twitter entries from my friends with one or more comments.

Friendfeed suffers from the fact that it’s userbase is not as big as Twitter’s, but the ‘real’ real-time search more than makes up for that in my opinion. What I mean by real, is that items are published automatically from all services. If you bring in your Flickr, comments and blog activity to Friendfeed, they will publish automatically. Twitter doesn’t do that, you have to actively share the link after you have uploaded to Flickr, made a comment somewhere or updated your blog.

Search Options: Google adds more intuitive search tools, ‘takes on Twitter’

It’s often said that Twitter’s big advantage over Google is its ability to allow you to conduct ‘real time search’ – if an event is happening right now, you don’t search Google, you search Twitter.

But today Google has announced a series of features that, while still not offering real time search, take it just that bit closer. For me it is the most significant change to Google’s core service in years. 

Here’s the video:

This week, while talking to my students about the ability to search by date in Google, the computer assisted reporting blogger Murray Dick mentioned how unreliable the feature was, so I wouldn’t get too excited. 

What is new, however, is the ‘recent search’ facility, which brings up results from the past hour or two. Continue reading

The services of the ‘semantic web’

Many of the services that are being developed as part of the ‘semantic web’ are necessarily works in progress, but they all contribute to extending the success of this burgeoning area of technology. There are plenty more popping up all the time, but for the purposes of this post I have loosely grouped some prominent sites into specialities – social networking, search and browsing – before briefly explaining their uses.

Continue reading

The next step to the ‘semantic web’

There are billions of pages of unsorted and unclassified information online, which make up millions of terabytes of data with almost no organisation.  It is not necessarily true that some of this information is valuable whilst some is worthless, that’s just a judgement for who desires it.  At the moment, the most common way to access any information is through the hegemonic search engines which act as an entry point.

Yet, despite Google’s dominace of the market and culture, the methodology of search still isn’t satisfactory.  Leading technologists see the next stage of development coming, where computers will become capable of effectively analysing and understanding data rather than just presenting it to us.  Search engine optimisation will eventually be replaced by the ‘semantic web’.

Continue reading