Searching is the most popular activity online after email. It is the prism through which we experience a significant proportion of the world’s information – from news and information about our community, through to health information, commerce, and just about anything that has a presence online.
Search Engine Society takes a critical look at search engines, how they work, the techniques used to manipulate them – from gaining better rankings to censorship, and the implications for privacy and democracy.
Chapter one looks at the development and workings of search engines, from the once-essential directories of Yahoo! and the citation-based algorithms of Google that now dominate the search landscape, through to lesser-known players such as social bookmarking service Delicious which relies on user-generated ‘folksonomies’ to organise material, and specialised regional and ‘vertical’ search engines like the French language Voila or the genetic materials search engine The Bioinformatic Harvester. This is situated within a wider discussion of information retrieval histories from the Library of Babylon onwards – and touches on recent moves into geospatial, mobile, social and semantic search.
Balancing that focus on technology, the following chapter focuses on users, looking at how people search. Search behaviours vary widely between users and between searches – Halavais discusses research that showed how many users simply add ‘.com’ to a word as the start of their search, while others use a ‘shopping mall’ approach of going direct to the likes of Wikipedia and the Internet Movie Database (which also contain search facilities). Using a search engine, Halavais argues, is only one method of search, and search is “not only an iterative process, but one that is rarely linear and requires seeking out the concepts that surround a problem or question. In other words, the query and search strategy is likely to change as more information becomes available.”
Search as ‘re-finding’
Halavais also emphasises the importance of ‘re-finding’ – “not as a sub-set of finding, but the other way around” – indeed, this is the basis of social bookmarking services like Delicious and Digg that allow the user to store and label (‘tag’) webpages for later retrieval, as well as searching for webpages that have been given similar tags by other users.
Power law distribution patterns famously recur throughout the web and in the third chapter Halavais looks at how this affects search results. With Google’s rankings relying so strongly on how many links point to a particular page, it is important to look at how those links are distributed. The fact that highly linked pages are likely to attract ever more links – what Huberman calls “preferential attachment” – leads to the “chunky” nature of the web – in concrete terms the dominance of websites like those of the BBC and Guardian; a quality which, Halavais argues, Google’s PageRank technology ‘calcifies’.
But when Google tweaks its search engine algorithms to attempt to improve results, it can have enormous consequences for organisations dependent on their rankings in search results. Halavais uses the example of Skyfacet.com and Answers.com which saw sales and visits drop by 17% and 28% respectively when they dropped off the first page of related Google searches. It is as if someone moved your shop from the main high street to an industrial estate. In this context it is not surprising that search engine advertising accounts for the majority of online advertising spend.
Following up on those issues, the fourth chapter looks at implications for democracy on two sides: firstly, the division between winners and losers in the contest for public attention; and secondly, the division between skilled and unskilled users of search engines. Halavais is keen to highlight that division is nothing new:
“Current search engines, like communication technologies before them, contain both centralizing and diversifying potentials. These potentials affect the stories we tell ourselves as a society; and the way we produce knowledge and wisdom.”
In practice, these potentials are heavily weighted towards US sites:
“In the language of PageRank, US sites simply have more authority: more links leading to them … sites have existed longer in the United States, where much of the early growth of the internet occurred… Add to this the idea that early winners have a continuing advantage in attracting new links and traffic, and US dominance of search seems a foregone conclusion … the search engines do not merely reflect this authority, they help to reproduce it.”
Indeed, ranking systems that reinforce authority, says Halavais, are conservative in nature and comprise what Lewis Mumford, writing 40 years ago, called “authoritarian technics”. But because of the unlimited size and reach of the internet compared to previous media technologies, it is not so simple:
“The current structure is a complex combination of a high degree of centralization at the macro-level, with a broad set of diverse divisions at the micro-level.”
Blogger as ‘search intellectual’
Interestingly, at this point Halavais introduces the blogger as a “search intellectual”, upsetting existing structures of authority on the web and acting as “a counterweight to the hegemonic culture of the search engines” in bringing otherwise overlooked material into the “circle of reputation and links that search engines tend to enforece”. The recent rise of Twitter in performing a similar role would be worth adding to that list.
Chapter 5 takes a broad look at censorship – “just another word for filtering” – while Chapter 6 looks at privacy – search engines as “databases of intentions” where even anonymised logs of what individuals are searching for can lead to people being identified. Chapter 7 revisits the rise of “sociable search” tools and folksonomy – where classification is created by a mass of users’ ‘tags’ rather than any centralised scheme, and ‘finding’ is a social act closely related to ‘sharing’.
The book closes with a roundup of the possibilities of future search and the factors that will influence that, from increasing digitisation of material to improved mapping and the possibilities of RFID tags (which makes objects a part of the web too). Semantic search – technology that understands the meaning of what you are searching for, or of relationships between objects – is the promise that lies forever ‘just over the horizon’, while sociable search offers a more likely immediate move.
As is natural, there are areas which have developed since this book was written and so are not tackled in depth – most notably real-time search. The rise of Twitter and the ability to search through what people are talking about ‘right now’ represents such serious competition to Google that it introduced the first major new features to its homepage in years. Wolfram Alpha – the “computational knowledge engine” that made newspaper front pages this year – is not even mentioned.
But those are incidental issues in what is an important book. Halavais manages to acknowledge the dominance of Google without being distracted by it, and gives due attention to non-Western tools and services not commonly seen as search tools. He avoids the pitfalls of technological determinism and manages to distinguish between top-down domination and bottom-up diversity. What emerges is a sophisticated picture of power in flux. “Search engines are interesting to the person who wants to understand the exercise of power in the information society,” Halavais writes in the his conclusion. “In an era in which knowledge is the only bankable commodity, search engines own the exchange floor.” The more readers understand this exchange floor, the better we can exchange and interrogate what information we possess.
A shorter version of this review will appear in Journalism