Previously I wrote on how to use the =importXML formula in Google Docs to pull information from an XML page into a conventional spreadsheet. In this Something For The Weekend post I’ll show how to take that formula further to grab information from webpages – and get updates when that information changes.
Asking questions of a webpage – or find out when the answer changes
Despite its name, the =importXML formula can be used to grab information from HTML pages as well. This post on SEO Gadget, for example, gives a series of examples ranging from grabbing information on Twitter users to price information and web analytics (it also has some further guidance on using these techniques, and is well worth a read for that).
Asking questions of webpages typically requires more advanced use of XPath than I outlined previously – and more trial and error.
This is because, while XML is a language designed to provide structure around data, HTML – used as it is for a much wider range of purposes – isn’t quite so tidy.
Finding the structure
To illustrate how you can use =importXML to grab data from a webpage, I’m going to grab data from Gorkana, a job ads site.







