Daniel Koller asked on Twitter an interesting question:

… are linksets today evaluated in an automated way?or does it depend on a person to interpret it?

Trying to answer this question here, but let’s step a bit: back in 2008, when I started to dive into ‘LOD metadata’ one of my main use cases was indeed how to automate the handling of LOD datasets. I wanted to have a formal description of a dataset’s characteristics in order to write a sort of middle ware (there it is again, this bad word) that could use the dataset metadata and take the burden away from a human to sift through the ‘natural language’ descriptions found in the Wiki pages, such as the Dataset page.

Where are we today?

Looking at the deployment of voiD, I guess we can say that there is a certain uptake; several publisher and systems support voiD and there are dedicated voiD stores available out there, such as the Talis voiD store and the RKB voiD store.

In our LDOW2009 paper Describing Linked Datasets we outlined a couple of potential use cases for voiD and gave some examples of actual usage already. Most notably, Linksets are used for ranking of datasets (see the DING! paper) and distributed query processing.

However, to date I’m not aware of any implementation of my above outlined idea of a middle ware that exploit Linksets. So, I guess one answer to Daniel’s question is: at the moment, mainly humans look at it and use it.

What can be done?

The key to voiD really is its abstraction level. We describe entire Datasets and their characteristics, not single resources such as a certain place, a book or a gene. Understanding that the links are the essence in a truly global-distributed information space, one can see that the Linksets are the key to automatically process the LOD datasets, as they bear the high-level metadata about the interlinking.

When you write an application today that consumes data from the LOD cloud, you need to manually code which datasets you are going to use. Now, imagine a piece of software that really operates on Linksets: suddenly, it would be possible to specify certain requirements and capabilities (such as: ‘needs to be linked with some geo data and with statistical data’) and dynamically plug-in matching dataset. Of course, towards realising this vision, there are other problems to overcome (for example concerning the supported vocabularies vs. SPARQL queries used in the application), however, at least to me, this is a very appealing area, worth investing more resources.

I hope this answers your question, Daniel, and I’m happy to keep you posted concerning the progress in this area.