SFTW: How to scrape webpages and ask questions with Google Docs and =importXML

Here’s another Something for the Weekend post. Last week I wrote a post on how to use the =importFeed formula in Google Docs spreadsheets to pull an RSS feed (or part of one) into a spreadsheet, and split it into columns. Another formula which performs a similar function more powerfully is =importXML.
There are at least 2 distinct journalistic uses for =importXML:
- You have found information that is only available in XML format and need to put it into a standard spreadsheet to interrogate it or combine it with other data.
- You want to extract some information from a webpage – perhaps on a regular basis – and put that in a structured format (a spreadsheet) so you can more easily ask questions of it.
The first task is the easiest, so I’ll explain how to do that in this post. I’ll use a separate post to explain the latter. Continue reading
