Imagine if Google offered a deal like this to news publishers (as you’ll have guessed, this is exactly how Google Scholar works):
- Where content is behind a paywall, Google will index it all and include it in its web results even if searchers who click through to the page are then told they can’t read the story without subscribing.
- Google will work out which is the authoritative source of a story and show that – so newspapers breaking exclusives get priority over bloggers etc.
- Google won’t differentiate these results in any way – searchers will think they’re going to see the content they can see in the Google results, but actually they’ll hit a paywall.
As I say, that’s exactly how Google Scholar works – but it’s not a deal that Google’s offering to newspapers.
How Google Scholar works
Here’s an example of the Google Scholar scheme in action. I did a search for “innocent purchasers” (don’t ask) and saw this result – notice the snippet of text showing some relevant content:
However, when I clicked through to the page, all I saw was this:
Surprisingly, the text from the snippet is nowhere to be seen (it’s not in the meta data either). All that’s shown in the issue details.
Google says that sites in Google Scholar must abide by this rule:
Google users must be offered at least a complete abstract. This is a crucial component of our indexing program. For papers with access restrictions, a full author-written abstract will help users choose among the results which paper is the most likely to have the information they are looking for.
But it seems you can get away with a few lines about which issue it was in.
How this differs from the normal search results
This sort of arrangement isn’t on offer to news organisations.
In Google news
News sites with a paywall can appear in Google News. They can either take part in first click free (explained here) in which case they must offer full access to the story for searchers coming via Google News (ie they must allow them through the paywall). As Google puts it:
To implement First Click Free, you need to allow all users who find a document on your site via Google search to see the full text of that document, even if they have not registered or subscribed to see that content. The user’s first click to your content area is free. However, once that user clicks a link on the original page, you can require them to sign in or register to read further.
Alternatively, they can appear in the results without offering the content, in which case the result shows a subscription tag, and when you click throu you’re told to subscribe.
Google web results
With its standard Google web search, first click free is still available to site publishers. They can have paywalls but, if they want Google to index their content, they must allow searchers who click through from Google to see it.
The second option described above is NOT available for Google’s normal web search.
Google’s very clear that, for its main web search, you cannot show search engines one thing and users another – so you can’t let Google index the pages but not let users see the content when the click through:
Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as “cloaking.”
So what’s the difference?
The difference is this. In its normal web results, as opposed to its news results, Google will only index paywalled content if you abide by the first click free rules – so you must let users see the content if they come via Google.
With Google Scholar, the rules are different:
If your works are already online, we may need nothing more than your permission for our crawlers to visit your site. As noted above, an abstract (at least) of each work must be available to non-subscribers who come from Google and Google Scholar.
If you’re in the Google Scholar program, it will still index the content even if you don’t let searchers see it. And this content appears in the normal web results, not just the specialised Google Scholar search.
On top of all this, Google tries to work out the primary version of a work for content in Google Scholar:
When multiple versions of a work are indexed, we select the full and authoritative text from the publisher as the primary version.
How this would help Rupert Murdoch
So if Murdoch wants to put the Sunday Times or the Sun behind a paywall but still wanted Google to index his content for the main web index (as opposed to just Google News), he would have to join first-click free.
If he decides the Sun is really the Wapping News Journal and joins Google Scholar, then the rules would be different. He could have his content indexed without having to let anyone see it unless they paid a subscription. On top of which, Google would give his content priority if was the original source of a story.