There are billions of pages of unsorted and unclassified information online, which make up millions of terabytes of data with almost no organisation. It is not necessarily true that some of this information is valuable whilst some is worthless, that’s just a judgement for who desires it. At the moment, the most common way to access any information is through the hegemonic search engines which act as an entry point.
Yet, despite Google’s dominace of the market and culture, the methodology of search still isn’t satisfactory. Leading technologists see the next stage of development coming, where computers will become capable of effectively analysing and understanding data rather than just presenting it to us. Search engine optimisation will eventually be replaced by the ‘semantic web’.
Correctly tagging the mass of available data to provide a clear sense of meaning is the best way of achieving this according to Nova Spivack, founder of Twine, one of the leading sites in this field. He says:
“This next generation is actually based on enriching the meaning, enriching the structure. The reason we want to do this is so that software can understand the web like humans can understand the web. Because the semantic web is not for humans, it is for machines.”
Undertaking this task will revolutionise the way we utilise the internet, creating intelligent interaction and impacting on the way the web is perceived in popular culture. Vint Cerf, one of the driving forces behind the creation of the internet, says:
‘I don’t believe that we will see arising out of the current internet…conscious artificial intelligence, but we will probably see the system become easier to interact with – for example, voice interaction is becoming increasingly easy to accomplish. I’m almost certain you’ll see products emerging that will allow you to orally interact with the network – ask for something, demand something, or command something and have [it] happen.
“We may feel that this system is more intelligent because we are interacting with it in ways that don’t require us to point, click and type. The semantic web idea will make the internet seem more intelligent because we are extracting knowledge that other people put into it in a way that looks pretty intelligent.”
So the aim of the ‘semantic web’ is to allow data to be accessed and shared effectively by wider communities, yet processed automatically by computer. In order for this to happen there needs to be a simple system to catagorise data so it can be easily located and organised.
Much progress has been made in this infrastructure, particularly in the development of the new languages – Resource Description Framework (RDF) and Web Ontology Language (OWL) – by the World Wide Web Consortium . The languages are used to annotate code, representing ‘knowledge’ which will enable applications to use them more intelligently.
At the moment HTML is limited to describing static content, documents and the links between them. However RDF, OWL, and XML can describe arbitrary things such as people, events or objects. It is layered on top of HTML and consists of a subject, a predicate, and an object. For example: “Jeremy Paxman” <subject> belongs to <predicate> journalists <object>.
These descriptions allow increased meaning behind the static content, demonstrating the structure of the knowledge behind it. In this way a machine can process knowledge itself instead of text, using a process similar to human reasoning. This should result in more meaningful results being returned in searches and perhaps even allow for increased automation when it comes to research by computers.
The success (or failure) of these experimental technologies will motivate further research and development, not only from within the industry but also academia. It is certain their efforts will influence the future development of information technology. In a further post I will explore the services currently being forged and in a final post on the ‘semantic web’ I will tackle the revolutionary uses this new technology has for journalism.
However the last word here must go to Tim Berners-Lee, the internet pioneer who says:
“A ‘semantic web’ has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialise.”