What's the best RDF serialization format?

Contrary to some other data models, RDF is not bound by a single serialization format. Triple statements (the data atoms of RDF) can be serialized in many ways, which leaves developers with a possibly tough decision: how should I serialize my linked data?

To answer the title’s question: it depends, but probably N-Triples / N-Quads.

So, let’s discuss the various formats and when you should use which one. The order in which they appear is chronological and does not reflect preference. Skip to the TL;DR if you’re feeling hasty.

RDF/XML

The first and perhaps most well-known RDF serialization format is RDF/XML. It’s also the most despised. Many systems were able to parse, store and serialize XML when RDF was invented almost 20 years ago, so RDF/XML seemed like a logical default.

Unfortunately, RDF/XML is a weird mixture of two fundamentally different concepts: a tree-like document, and a triple-based graph. This makes RDF/XML conceptually difficult and quite verbose, compared to other standards. For XML developers, it might look familiar, but since it does not clearly reflect the triple model, it will probably cause confusion.

Use it only if you need to work with XML.

<?xml version="1.0"?> <rdf:RDF xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema= "http://schema.org/" > <rdf:Description rdf:about= "https://www.w3.org/People/Berners-Lee/" > <schema:birthDate> 1966-06-08 </schema:birthDate> <schema:birthPlace rdf:resource= "http://dbpedia.org/resource/London" /> </rdf:Description> </rdf:RDF>

RDFa

RDFa is RDF inside HTML. By adding attributes to HTML elements, you can give semantic context to the content inside your webpages. Google parses it (amongst JSON-LD, Microdata, Microformats and Pagemaps) to enhance their search previews, although they recommend using JSON-LD. W3C’s own Respec documentation tool dropped support for RDFa, mainly because the adoption of RDFa was too low, the required code was messy and even Google didn’t parse it correctly.

RDFa is fundamentally different from the other mentioned formats: it combines RDF with view data (HTML). This means that it makes your HTML documents a bit larger and more complicated, and parsing it for triples will be more costly than parsing an RDF only format like N-Triples. This makes it less useful if your application relies on a lot of RDF data.

Use RDFa if you want to make your existing website/ blog / HTML based application more semantic.

<div about= "https://www.w3.org/People/Berners-Lee/" > <p> Tim is born on <span property= "http://schema.org/birthDate" > 1955-06-07 </span> in <a property= "http://schema.org/birthPlace" href= "http://dbpedia.org/resource/London" > London </a> </p> </div>

Notation3 (.n3)

Tim-Berners Lee wanted something better than RDF/XML, and came up with N3. Contrary to RDF/XML, N3 closely resembles the RDF Subject / Predicate / Object model. This makes N3 very easy on the eyes and helps to understand how RDF works. By using @prefixes, N3 can be quite compact.

However, N3 is relatively costly to serialize, which could hinder performance. It’s also quite feature-heavy since it supports RDF rules, which makes it harder to parse.

Unless you need the reasoning/rules features of N3, use its more popular (and very similar) successor Turtle.

@prefix tim: <https://www.w3.org/People/Berners-Lee/>. @prefix schema: <http://schema.org/>. @prefix dbpedia: <http://dbpedia.org/resource/>. <tim> schema:birthDate "1955-06-08"^^<http://www.w3.org/2001/XMLSchema#date>. <tim> schema:birthPlace <dbpedia:London>. <tim> schema:birthPlace <dbpedia:London>.

Turtle (.ttl)

Turtle, the Terse RDF Triple Language, is a subset of N3.

It strips some of the syntactic sugar en features of N3, which makes parsing Turtle a bit simpler. This, in turn, made Turtle more popular, which means that it’s easier to find libraries for it.

Unfortunately, it’s still quite costly to parse compared to N-Triples.

Turtle is highly human-readable and is, therefore, a good candidate if you need to edit RDF by hand.

@prefix tim: <https://www.w3.org/People/Berners-Lee/> . @prefix schema: <http://schema.org/> . @prefix dbpedia: <http://dbpedia.org/resource/> . <tim> schema: birthDate "1955-06-08" ^^ <http://www.w3.org/2001/XMLSchema#date> . <tim> schema: birthPlace <dbpedia:London> .

N-Triples (.nt) and N-Quads (.nq)

N-Triples is a very simple subset of Turtle, which in turn is a simple subset of N3. N-Triples does not support @prefixes or any fancy features. This makes N-Triples trivial to parse / serialize. Therefore, many libraries for N-Triples are available and you can easily write one yourself. It also makes serialization and parsing highly performant.

<https://www.w3.org/People/Berners-Lee/> <http://schema.org/birthDate> "1955-06-08"^^<http://www.w3.org/2001/XMLSchema#date>. <https://www.w3.org/People/Berners-Lee/> <http://schema.org/birthPlace> <http://dbpedia.org/resource/London>.

However, the lack of prefixes and shorthands makes the format lengthy and a bit tough to read. The lengthy URLs also mean that you’ll need some form of compression (e.g.g-zip) if you don’t want to waste precious bandwidth or storage capacity, so make sure to enable that in your server.

Since writing a parser / serializer for N-Triples is so simple, it’s a good idea to support this pretty much always. And since N-Triples is a subset of Turtle and N3, it means that Turtle / N3 parsers know how to deal with N-Triples, too.

N-Quads are like N-Turiples, but they have an optional fourth column, which can be used to denote a graph label. The graph label often refers to the source of the data, e.g. the URL of the HTML document or some external RDF resource.

JSON-LD (.jsonld)

JSON is, without a doubt, the most popular way to handle data in web applications. JSON-LD is an extension of JSON and is valid JSON as well. You can turn your regular plain old JSON into RDF by adding @context . This object mainly serves as a mapping, so your plain keys get turned into fancy links to RDF Classes and Properties. You can add context either by adding an @content header in your HTTP response, by including the link in your JSON body, or by adding the entire @context object to you JSON. This means that if you want to upgrade your JSON API to JSON-LD, you get to keep your serializers.

{ "@context" : { "dbpedia" : "http://dbpedia.org/resource/" , "schema" : "http://schema.org/" }, "@id" : "https://www.w3.org/People/Berners-Lee/" , "schema:birthDate" : "1955-06-08" , "schema:birthPlace" : { "@id" : "dbpedia:London" } }

JSON-LD is easy to read, and will feel familiar even to those new to RDF and linked data. Because it’s still valid JSON, it’s usable to those who don’t want to deal with URLs. JSON arrays are converted to RDF Lists. I recommend spending some time in the JSON-LD playground to get familiar with how it works.

Unfortunately, JSON-LD difficult and costly to parse if you need the RDF data instead of the JSON object. This complexity in parsing limits how many (bug-free) JSON-LD parsers are available, and it also means that parsing JSON-LD takes long.

JSON-LD is a compromise. It supports RDF, it supports JSON, and it does both okay. Use JSON-LD if you already have a RESTful JSON API, and if performant RDF parsing is not crucial.

HexTuples

HexTuples is an NDJSON (Newline Delimited JSON) based RDF serialization format. It is designed to achieve the best possible performance in a JS context (i.e. the browser). It uses plain JSON arrays, in which the position of the items denote subject , predicate , object , datatype , lang and graph .

["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthDate", "1955-06-08", "http://www.w3.org/2001/XMLSchema#date", "", ""] ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthPlace", "http://dbpedia.org/resource/London", "http://www.w3.org/1999/02/22-rdf-syntax-ns#namedNode", "", ""]

HexTuples is designed by Thom van Kalkeren (a colleague of mine) because he noticed that parsing / serialization was unnecessarily costly in our stack, even when using the relatively performant n-quads format. Since HexTuples is serialized in NDJSON, it benefits from the highly optimized JSON parsers in browsers. It uses NDJSON instead of regular JSON because it makes it easier to parse concatenated responses (multiple root objects in one document). As an added plus, this enables streaming parsing as well, which gives it another performance boost. Our JS RDF libraries (link-lib, link-redux) have an internal RDF graph model which uses these arrays as well, which means that there is minimal mapping cost when parsing Hex-Tuple statements. This format is especially suitable for real front-end applications that use dynamic RDF data. It is not yet properly documented.

HDT

HDT (Header, Dictionary, Triples) is a compact data structure and binary serialization format for RDF, so it’s more than just a way to serialize RDF. Its data structure saves space and bandwidth (it’s half the size of gzipped N-Triples). Its design has indexing built-in, which means it can be searched or browsed efficiently. Check out the impressive technical specification if you want to learn more about how it works. HDT compression is a costly process, so it’s not that attractive for highly dynamic data / data that changes over time. Although some really useful libraries for HDT exist, be sure to check if there exists libraries that work with your stack.

RDF Binary Thrift

RDF Binary thrift is an encoding of RDF data that uses Apache Thrift. It is used in the Apache Jena RDF store. It’s a binary format, and therefore it’s cheap to parse and serialize. I’ve never used it, but it sounds interesting, although you’ll need Thrift tooling for decoding it.

Let your users choose a format

Choosing an RDF serialization format for your application or service might be a false dilemma. Since you control your application and probably have an internal model, you can offer multiple serialization options. Therefore, you can implement a serialization library (e.g. our rdf-serializers gem for Rails) and use HTTP Content negotation, so your project can handle all kinds of formats.

TL;DR