The semantic web has been one of the hot buzz phrases of the last few years. The idea is that search engines are rubbish because they don't really understand the meaning of what users are searching for. But what if computers actually understood the context of the text that they were crawling? Semantic web advocates argue that such understanding would make searches much more efficient and allow for much more specific queries. Some of the latest computer learning research, published in Science, shows that we use brain imaging technology to take some baby steps in the right direction.

The basic problem that faces the semantic web is that while a culture generally has an agreed-upon meaning for a word, it is hard to break that meaning up into symbols that a computer can understand. One way to go about tackling this problem determine what symbols our brain uses to convey that meaning. While we're still a ways off from decoding the internal symbolic "language" of the mind, functional magnetic resonance imaging (fMRI) indicates that meaning seems to be associative. For instance, when a person is shown a picture of celery, fMRI pictures of their brain will usually show the part of the brain associated with taste activating. The general conclusion is that objects are associated with sensory and motor control regions. Unfortunately, it is not quite that simple, because other parts of the brain, primarily the frontal cortex, also light up, indicating that there is more going on. Nevertheless, the associative patterns are strong enough that a trained observer can accurately guess the object being shown to a subject by watching the fMRI pattern.

In a twist on that observation, scientists have trained a computer using word associations and fMRI patterns to see if it could predict the fMRI pattern of nouns that it had never encountered before. This was achieved by creating a neural network with an enormous text from which to gather word associations. The network was then fed 60 nouns and a set of verb classes—from there it searched this text to correlate the nouns with the verb classes, creating a 25-dimensional model for each noun. Finally the network was trained on fMRI patterns for some of the nouns.

To test how well the model performed, the researchers made the network predict the brain activity patterns for nouns for which it had no fMRI data. This worked significantly better than the rate expected by chance. Furthermore, the model was able to construct fairly accurate fMRI images of some of the associated verbs, as well. To test the model further, the researchers gave the network words that were not associated with any of the verb sets. The neural network then successfully predicted the fMRI patterns for these new words, though not quite as accurately as for words within the training categories.

If you thought that was impressive, the researchers then tested it on similar nouns (e.g., corn and celery). In this case, the model's performance fell substantially, but was still better than chance. This was expected, since the similarities mean that additional meaning has to be derived from other nouns—like color and shape, for instance—a process the neural network was not capable of handling. Finally, the carefully chosen category set was replaced with a random one. This set included verbs, nouns, and words that have little independent semantic value (like "the"). Using this category set, the neural network performed very poorly, though still slightly better than chance, probably due to the fact that there were still verb categories.

So what does this mean? For one, it adds a lot of evidence to the idea that humans develop their meaning for nouns associatively. It also tells us that the primary associations that our brain makes for words are with sensory perception—a smell, for instance—or actions. These have dedicated brain regions that are associated with them. A noun then is instantiated in the brain as the set of actions and experiences associated with it, rather than some concrete independent object.

So what does this mean for the chances of the semantic web? These findings tell us that researchers looking for statistical associations between nouns and verbs are probably on the right path to generating contextual meaning for those nouns—even when they are used out of context. However, there is a long way to go yet, and the chances of it improving search engine accuracy are limited. This is because current algorithms are already semantic, albeit indirectly. As an example, the much vaunted Google algorithm uses link maps to rank hits. However, it tacitly assumes that humans do the linking, and that humans know what they are doing. This builds the semantic aspect in, but the indirectness also leaves open a frustrating ambiguity that users must learn to overcome

Science, 2008, DOI: 10.1126/science.1152876