There have been quite a few tools springing up over the past few months that I’ve not had time to blog about, so here’s a roundup post on all of them – a bumper Something For The Weekend (let me know how you find these).
1. Junar – for scraping websites and sharing data
Junar presents a much easier way to scrape data from online tables with its ‘Collect Data‘ tool – and the team behind it tell me they have plans to build functionality allowing users to scrape linked pages, as well as the ability to scrape PDFs.
2. BuzzData – for sharing data
BuzzData is a platform for sharing data – essentially a social network where you can follow other data journalists or datasets, tag and license your data, and – importantly – add visualisations, articles and attachments. When someone else builds on your data, it tells you, which is nice.
3. DataMarket – for finding data
DataMarket is exactly what it says on the tin: a market for data from organisations including the UN, BP, Eurostat, the IMF, USGS, and various other acronyms. You can access the data for free, or pay for extra functionality such as exporting to Excel.
4. Google News Scraper – for grabbing data on news coverage
This scraper will allow you to gather data on coverage of a particular issue, event or person. It only gathers the teaser text but the country data may if you want to map coverage, while the URLs can provide a starting point for further scraping experiments.
5. Metadata extraction tool – a first step for searching document dumps?
This is aimed at file preservation activities, but it has a few possible applications for journalists. Firstly, it has a Windows interface for exploring the metadata of a bunch of files, making it possible to sort in different ways to more quickly look for information you’re seeking. Secondly, the generation of an XML file will give some structure which could allow you to, for example, plot your documents on a timeline, spotting patterns or outliers.
6. Roambi – data visualisation on your iPhone
Sadly, it’s only your iPhone, not anyone else’s, so this is more if you’re on the move but want to go through some private data visualisations which might hide a story.
7. Data Wrangler – web-based data cleaning tool
This looks pretty powerful, if not pretty full stop. Here’s a video:
8. Impure – visual programming language
From the About page:
“Impure is a visual programming language aimed to gather, process and visualize information. With impure is possible to obtain information from very different sources; from user owned data to diverse feeds in internet, including social media data, real time or historical financial information, images, news, search queries and many more. Impure is a tool to be in touch with data around internet, to deeply understand it. Within a modular logic interface you can quickly link information to operators, controls and visualization methods, bringing all the power of the comprehension of information and knowledge to the not programmers that want to work with information in a professional way.”
9. Zanran – PDF/spreadsheet/table search engine
This looks a very useful tool for narrowing down searches to PDFs, spreadsheets, and tables within webpages (the advanced search allows further narrowing by filetype, date, server location and site). Clever stuff behind it – particularly in the way it looks at images and decides if they’re charts. The site says they plan to add Word documents and PowerPoint presentations soon.
Thanks, useful post.
Thanks, now fix that Tweet button.
Noticed you did : )thanks
This is a great list. I have been using Junar.com and find it has resolved a lot of the tedium of keeping web pages updated with third-party data sources.
Junar allow scrape of DOC, XLS and OpenDocument formats too!
Pingback: 9 ferramentas para jornalismo e fluxo de dados | Web diálogos | Comunicação Digital
Pingback: Ferramentas para trabalhar com dados | Webmanario
Pingback: SFTW: 9 data journalism tools | Online Journalism Blog « Totoromano's Blog
Pingback: Staffroom Secrets | Blog | #Tip of the day from Journalism.co.uk – nine new data tools
Pingback: … this week…OnlineJournalismblog « Knowledge Management (ADED 300) – Fall 2011
Pingback: Present statistics well – that figures | CTJT Blog
Pingback: datajournalismi.fi – HsOpen 10.10.2011 – ihmisiä – datatyökaluja ja raportointia