From JSON to RDF in Six Easy Steps with JRON June 4, 2010

Sometimes, if you stand in the right place and squint, JSON and RDF line up perfectly. Each time I notice this, I badly want a way to make them line up all the time, no matter where you’re standing. And, actually, I think it’s pretty easy.

I’ve seen a few proposals for how to work with RDF data in JSON, but the ones I’ve seen put too much burden on JSON folks to accomodate RDF. It seems to me we can let JSON keep doing what it does so well, and meanwhile, we can provide bits of RDF which can be adopted when needed. Instead of pushing RDF on people, allow them to take the parts they find useful.

In thinking about it, I’ve come up with six things RDF can do that are not standard parts of JSON. These are things one can do with JSON, of course, but not in any standard way. My suggestion is these bits of functionally be provided in an RDF-compatible way (as I detail below), so that the JSON world and the RDF world can start to really play well together.

I’m interested to hear what people think of this. Blog comment, email to sandro@hawke.org (maybe cc semantic-web@w3.org?), or catch me in the halls at SemTech. I expect this general topic of RDF-meets-JSON will be discussed at the RDF Next Steps workshop, and if the stars line up right, maybe we can get a W3C Recommendation in this space in the next year or so. Let’s call this particular proposal JRON 0.1 (Javascript RDF Object Notation), not “Sandro’s Proposal”, so I can be freer to like other designs and be properly neutral.

Step 0: Start with ordinary JSON

In general, JSON and RDF are very similar, although they are usually described using different terminology. Of course, they both have strings and numbers. They both have way of encoding a sequence of items: arrays in JSON, lists in RDF (some details below). The main structuring is around key-value pairs, which JSON calls an ‘object’. In RDF we call it the “subject” and focus on its connection with each key-value pair; the three together form an RDF triple.

The point here is that ordinary JSON structures correspond to an important subset of RDF. The don’t exactly match that subset because RDF uses namespace, as detailed in step 5 below. The other steps below show the ways in which JSON is a subset of RDF. If one takes all the steps here, using JSON with these conventions, one has full RDF.

So, here are the steps. Steps 1-3 are pretty simple and not very interesting. They address everyday concerns in data processing. Steps 4-6 may be a little more surprising if you’re not familiar with RDF.

Step 1: Allow Extended Datatypes

Why: For datatypes, JSON only has strings, numbers, booleans. Sometimes people want to store and manipulate other datatypes, such as dates, or application-specific datatypes.

How: RDF uses XML’s datatype mechanism, where data values are conveyed as a pair of items: a lexical representation (a sequence of characters) and a datatype identifier (a sequences of characters which happens to be a URI). Each datatype is a mapping from strings (lexical representations) to values; the datatype identifier tells us which datatype is to be used to interpret this particular representation.

In JRON, we represent this pair like this:

{ "__repr": "2010-03-06", "__type": "http://www.w3.org/2001/XMLSchema#date" }

You can put this as a value in a list or in a key-value pair, just like a string or number.

RDF doesn’t restrict which datatypes are used. Some recent standards work selected this list as the set people should implement.

Personally, I’m not sure users need to be able to extend datatypes. I see dates being important, but otherwise I’m not convinced. Still, it’s in RDF, and I like compatibility, so it’s here.

Step 2: Allow Language Tags

Why: When you have text available in several different languages, language tags provide a way to select which of the available strings, if any, matches the language preference of the user.

Also: Text-to-speech systems can handle text better if they know which natural language to use in pronouncing the text.

How: RDF allows language tags on string literals. In JRON, we use a pair like this:

{ "__text": "chat", "__lang": "fr" }

Commentary: Personally, I’ve never liked this bit of RDF. I feel like there are better architectures for handling language tagging. But there was a vocal community that felt this was essential, so it’s in the standard. I gather some people like it, and I haven’t seen a good counter-proposal.

Step 3: Allow Non-Tree Structures

Why: Sometimes your data is not tree structured. Sometimes you have an arbitrary directed graph, such as when representing a social network.

How: In RDF, an arbitrary “node id” is available for making non-tree structures. We can do the same in JRON, saying any object may have a node id, and if it does, the object is considered the same as all other objects with the same node id. Like this bit JSON saying my friend Eric and I both know each other:

... { "foaf_name": "Sandro Hawke", "foaf_knows: { "__node_id": "n102" }, "__node_id": "n334" } ... { "foaf_name": "Eric Prud'hommeaux", "foaf_knows: { "__node_id": "n334" }, "__node_id": "n102" } ...

In the above example, the objects representing me and Eric are given node ids, and then those node ids are used to make the links to each other. We could also do this with only one node id, but we still need at least one:

{ "foaf_name": "Sandro Hawke", "foaf_knows: { "foaf_name": "Eric Prud'hommeaux", "foaf_knows: { "__node_id": "n334" }, "__node_id": "n334" }

Okay, those were the ordinary three things to add to JSON. Here are the interesting three:

Step 4: Allow Cross-Document Structures

Why: Sometimes, there is useful, relevant data available on the Web but it’s not part of the current JSON document. We would not want all the Web pages in the world to be gathered into one big Web page; similarly, it’s good to keep data in different documents. But that shouldn’t stop us from easily combining the data, and keeping the links intact.

How: RDF allows IRIs (unicode web addresses) to be used as node identifiers. They are like node ids, except they work across multiple documents; they are globally unambiguous identifiers, and systems can use Web protocols to dereference them to get other useful information.

In JSON, we can do this:

{ "foaf_name": "Sandro Hawke", "__iri": "http://www.w3.org/People/Sandro/data#Sandro_Hawke" }

Commentary: So why do we still need __node_id? Because sometimes it’s a pain to make up a good IRI. Some people prefer to always use IRIs, avoiding node_ids in their data, and that’s fine.

Step 5: Put Keys in Namespaces

Why: When data is coming from different sources across the Web, it’s not practical to get all the sources to agree on all the terminology. Instead, by using Web addresses (URLs/IRIs) as our keys, we allow individuals and organizations to make their own decisions. They can decide how much to share their vocabularies, and they avoid accidental name collisions. The web address also provides a handy link to documentation, community sites, schemas, etc.

How: It’s awkward to use a whole, long http IRIs everywhere, so as in many RDF syntaxes, JRON has a prefix expansion mechanism, like this:

{ "foaf_name": "Sandro Hawke", ... "__prefixes": { "foaf_" : "http://xmlns.com/foaf/0.1/" } }

Here the key “foaf_name” gets expanded into “http://xmlns.com/foaf/0.1/name”, which serves as a unique-on-the-Internet identifier for a particular conceptualization of names.

Commentary: Although I’ve left it almost to the end, this is the one mandatory part of this proposal. All the other elements are only present when required by the data. The null JRON document is: {“__prefixes”:{}}

Others have suggested this part can be optional, too, by having a set of standard prefixes for a given API. I’m not entirely opposed to that, but I’m concerned about how those defaults would be communicated in practice.

Also, I’m not sure there’s consensus on what character to use in the short name: should it be foaf_name, foaf.name, foaf:name, or what? The mechanism here is that you can use whatever you want: the __prefixes table keys are matched longest-first. If there’s an entry with an empty string, that provides a default namespace.

Step 6: Allow Multiple Values Per Key

Why: Sometimes it makes sense to have more than one value for some property. For instance, as it turns out, I have more than one friend. I could use a single-value ‘list-of-friends’ property, but sometimes it makes more sense to use a ‘friend’ property that has multiple values. In particular, if we’ll be learning who my friends are from multiple sources, and we were using lists, what order would we put the resulting combined list in?

How: We still just use JSON lists, but we indicate that the order does not matter, so the values can be merged arbitrarily:

{ "foaf.name": "Sandro Hawke", "foaf.knows: { "__values": [ { "foaf.name": "Eric Prud'hommeaux" }, { "foaf.name": "Dan Brickley" }, { "foaf.name": "Matt Womer" } ]} }

Closing Thoughts

That’s it. Those are the six things that RDF does that normal JSONdoesn’t do. Did I miss something?

The API I’m imagining (but haven’t built yet) would have a few

features like:

jron_reprefix(tree, desired_prefixes) Returns another JRON tree with all the prefixes matching the ones provided here. If you’re going to use foaf, for instance, you probably want to set a prefix like “foaf.” for foaf, so your code can expect it. jron_merge_nodes(tree) and jron_treeify(tree) convert a tree (suitable for transmitting) from/to a graph (suitable for use in memory jron_use_native_type(tree) Would convert all the __type/__repr objects into suitable local objects, if they exist. Maybe even date/time objects, if there’s a suitable library installed for those.

One technical issue for RDF folks:

Should JSON arrays be considered RDF Lists or RDF Sequences? Perhaps they default to RDF Lists but there can be an option flag in the top-level object:

{ ... "__json_array_is_rdf_seq": true ... }

When that flag is absent or false, arrays would be considered RDF Lists. My sense is no one needs to use both. Maybe soon we’ll know if RDF Sequences can finally be deprecated.