David Karger speaking, ESWC 2013, Montpellier

A hot topic at ESWC 2013, and many other places besides, was the issue of Semantic Web adoption, which after a decade and a half is still less than it should be. The thorny question is: what can be done about it? David Karger did a keynote on the subject at ESWC 2013 where he argued that the Semantic Web can help users manage their data. I think he's right, but that this is only a very narrow area of application. In any case, end users are not the people we should aim for if adoption of Semantic Web technologies is to be the goal.

End users and technology

In a nutshell, end users do not adopt technology, they choose tools. They find an application they think solves their problem, then buy or install that. They want to keep track of their customers, so they buy a CRM tool. What technology the tool is based on is something they very rarely care about, and rightly so, as it's the features of the tool itself that generally matters to them.

Thinking about comparable cases may help make this point more clearly. How did relational databases succeed? Not by appealing to end users. When you use an RDBMS-based application today, are users aware what's under the hood? Very rarely. Similarly with XML. At heart it's a very simple technology, but even so it was not end users who bought into it, but rather developers, consultants, software vendors, and architects.

If the Semantic Web technologies ever succeed, it will be by appealing to the same groups. Unfortunately, the community is doing a poor job of that now.

Selling to technologists is hard

The trouble with selling to technologists is that the developer community is actually fairly conservative. Real change in how software is developed can take a long time, and the more fundamental the change is, the longer it takes. Python is pretty hot now, but I started using it in 1997 (16 years ago!), and there are still big groups of developers who know no scripting languages at all, and are skeptical of the whole idea of a scripting language.

Semantic technologies have problems because they are something truly new that also requires a fairly fundamental change to how systems are built. For many developers, this change is both very difficult to understand, and even frightening. Often, people wind up taking this new shiny technology that works in a completely new way, and then use it to build systems the same old way. Inevitably, they discover that this doesn't really gain them that much, and they give up on the whole thing.

The problem is that the community is at present doing very little to remedy this.

Boys playing soccer, Montpellier

What's right, and what's wrong

The main things the community is doing wrong is the architecture of what's being offered, and the story that's told about how these technologies can be applied. At the lowest levels, I think we're doing fine. RDF itself, the interchange formats, triple stores, SPARQL, and the SPARQL protocol, all of these things need no change. They're imperfect, just like everything else, but they're not the source of the problem. That source lies higher up.

One problem is RDFS and OWL. Not because they are bad. On the contrary, I think these are among the most powerful tools semantic technology has to offer. The problem is rather that it's very difficult for people to understand what these technologies actually are. And when you don't even understand what something is, seeing what it can do is just too hard. So, basically, that these technologies are meant for reasoning, and what reasoning can do, is way outside what most technologists have been able to understand. Many developers find ordinary data modelling somewhat frightening and tricky; imagine how they feel about modelling data in description logic.

And that, unfortunately, is only part of the problem. The other major part is that RDFS and OWL leave a gaping hole in the functionality offered by the Semantic Web technology stack. That is, there's no real support for validation. No technology allows me to say that "every person must have an id, a name, and may have an email address, and cannot have anything else." And yet this is a key everyday requirement, and it's just not being met.

(I'm aware that there is some excellent research in this area, and that Stardog has implemented some of it. That's a great start in the right direction, but not even close to enough. The recent announcement of the RDF validation work is hugely encouraging.)

Another problem is that there's no way to find out what properties a class can have. That's going to be another common requirement, and again it's not being met. I understand that people will balk at this requirement (and the previous one), and say that on the Semantic Web you're operating under the Open World Assumption, and so these requirements are wrong-headed.

That's both right and wrong. It's true that on the Semantic Web you're wrong to want this. That, however, is no help to someone trying to build an application. They may well have parts of the system that are open, but they will also have parts that are closed. RDF at the moment makes those latter parts much harder than they need to be.

Court of appeals, Montpellier

Then there's the question of architecture. For technology to be useful, it has to fit in with what's already there. An astonishing amount of what I see at conferences and elsewhere completely fails to do that. Real-world businesses live with ActiveDirectory, Sharepoint, CRM systems, and so on. Much of what they're being offered doesn't fit with these kinds of systems at all. That makes things difficult.

Much of what's being done on database access, such as query federation and mapping SPARQL to SQL with R2RML doesn't really seem very realistic to me, either. Most big databases are already overloaded. Now we're going to hammer them with random, uncontrolled queries? Many places you are not going to be allowed to do that. And even if you are, what happens if you do query federation and one out of 15 sources is down or slow? Who's going to build a business-critical application on top of something like that?

What's the story?

Further, and this is perhaps the worst part, there's no guidance on what sorts of problems to solve with semantic technologies, or how to apply them. This makes it much harder for people to adopt them, and increases the chances that people will apply them wrongly, winding up with little benefit from their choice.

(I'm hoping to do my part by writing more about what I think are the right ways to use semantic technology in future posts, but that's unlikely to be enough on its own, to put it mildly.)