There’s a story in Australia that News Corp. is preparing to sue Google and Yahoo to stop both from linking to, and quoting News Corp content. It comes as Rupert Murdoch promises to start charging for online content across his company’s news sites.
The suing story has prompted the usual hilarity, with comments such as if murdoch sues google & yahoo over news rather than use robots.txt file, it’ll be a short, embarrassing lawsuit. But here’s why Murdoch might have a case (first posted here) …
Robots.txt isn’t a panacea
The usual response to newspapers’ complaints about Google is to say ‘just use robots.txt to keep them out.’ This was Google’s response in its two fingers to the news industry.
However, most people don’t seem to realise that it’s hard to stay out of Google News and remain in the main Google index:
Please keep in mind that the robot we use for Google News, called Googlebot, is the same robot that we use for Google Web Search. This means that any settings you modify for Google News will also apply to Google Web Search. (From Google Support)
There’s a difference between Google News and Google Search
Google search is a way for a user to enter a term and for Google to show relevant pages. Google News these days looks like a fully fledged news aggregation service – check out its front page, and tell me how much that differs from a publisher’s news home page?
Just because publishers are happy to appear in normal search results, doesn’t mean they want their content used for free to create a rival news source/product. But there’s no way to use robots.txt – google’s supposed answer – to draw this distinction.
Google is ignoring ACAP
Publishers have attempted to help Google out with their own protocol called Automated Content Access Protocol – a way to build on robots.txt and allow better control over how their content is used.
Google won’t implement it saying that: “Our guiding principle is that whatever technical standards we introduce must work for the whole web (big publishers and small), not just for one subset or field”.
But Google already draws a distinction between big and small publishers. I publish a blog, but I’m not allowed in Google News, even though I’m in the main Google index.
I’m not saying that any publisher will actually want to stay out of Google. But robots.txt isn’t the answer to the problem of how publishers get paid for or control access to their content.