SPARQL is a query language for RDF data on the Semantic Web with formally defined meaning. This document is a simple introduction to the new features of the language, including an explanation of its differences with respect to the previous SPARQL Query Language Recommendation [ SPARQL/Query 1.0 ]. It also presents the requirements that have motivated the design of the main new features, and their rationale from a theoretical and implementation perspective.

May Be Superseded

This is a First Public Working Draft of a feature requirement documents for the continued SPARQL language development. This document is expected to change in response to public input and working group decisions.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Comments are solicited

The SPARQL Working Group seeks public feedback on this Working Draft. Please send your comments to public-rdf-dawg-comments@w3.org (public archive). If possible, please offer specific changes to the text that would address your concern.

No Endorsement

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.





1 Introduction

This document provides an overview of the main new features of SPARQL and their rationale. This is an update to SPARQL adding several new features that have been agreed by the SPARQL WG. These language features were determined based on real applications and user and tool-developer experience.

1.1 List of Features

The following features have been agreed by the SPARQL WG. These features have been grouped into Required and Time-permitting features as follows.

1.2 Goals and structure of the document

In the remainder of this document we will present the new features according to the nomenclature agreed by the Working Group:

SPARQL/Query 1.1: referring to the SPARQL Query Language

SPARQL/Update 1.0: referring to the SPARQL Update Language

Each feature is described in a common pattern as follows:

Motivations a brief sentence explaining why the new feature was added Description a more complete description of the feature Existing implementation(s) a list of existing implementations for the proposed feature and example syntax used in the implementation Related discussions links to related discussions of the WG regarding the feature (mainly issues raised) and Status the status of the feature, i.e. either required or time-permitting.

This current working draft details only one required features, but motivation and a description is also provided for the time-permitting features.

2 SPARQL/Query 1.1

2.1: Aggregate functions

2.1.1 Motivations

Aggregate functions allow operations such as counting, numerical min/max/average and so on, by operating over columns of results. They are currently not taken into account in SPARQL and then require additional scripting to parse query results and get these informations, e.g. the number of triples that satisfy a particular statement. Hence, a language extension is needed.

2.1.2 Description

In SPARQL/Query 1.0 (original SPARQL), query patterns yield a solution set (effectively a table of solutions) from which certain columns are projected and returned as the result of the query. Aggregates provides the ability to partition a solution set into one or more groups based on rows that share specified values, and then to create a new solution set which contains one row per aggregated group. Each solution in this new aggregate solution set may contain either variables whose values are constant throughout the group or aggregate functions that can be applied to the rows in a group to yield a single value. Common aggregate functions include COUNT, SUM, MIN, and MAX.

Aggregate functions are commonly required to perform a slew of application and data-analysis tasks, such as:

Determining the number of distinct resources that satisfy certain criteria

Calculating the average exam score of students grouped by school district

Summing the campaign contributions of donors, grouped by postal code and political party

Applications can typically take a SPARQL/Query 1.0 solution set and calculate aggregate values themselves. Enabling SPARQL engines to calculate aggregates, however, results in moving work from the application to the SPARQL engine, and will usually result in significantly smaller solution sets being returned to the application.

2.1.3 Existing implementation

The following systems are known by the WG at the time of publication to support one or more aggregate functions:

Garlik's JXT implements COUNT() and AVG()

Dave Beckett's Redland implements COUNT()

ARQ implements COUNT() and SUM() with syntax like (COUNT(*) AS ?c) to fit with expressions. Bare COUNT(*) allowed.

to fit with expressions. Bare allowed. Open Anzo's Glitter engine implements AVG(), COUNT(), SUM(), MIN(), MAX().

Virtuoso implements AVG(), COUNT(), SUM(), MIN(), MAX(), user-defined aggregates such as VECTOR_AGG or XML tree constructors. Appropriate GROUP BY clause is composed automatically, if missing in the original query.

ARC supports COUNT , MAX , MIN , AVG , and SUM .

Several aggregate functions are widely implemented, and implementations tend to project out results like for example:

SELECT COUNT(?person) AS ?alices WHERE { ?person :name "Alice" . }

return the number of times the a triple of the form _ :name "Alice" appears in the source data.

SELECT AVG(?value) AS ?average WHERE { ?good a :Widget ; :value ?value . }

2.1.4 Related discussions

Related issues raised by the WG:

[ISSUE 11]: Implicit vs explicit GROUPing

[ISSUE 12]: Presence and syntactic detail of HAVING clause

[ISSUE 13]: Subqueries in HAVING analogous to subqueries in FILTERs

[ISSUE 14]: Which aggregates to include

[ISSUE 15]: Extensibility of aggregate functions

[ISSUE 16]: Dealing with aggregates over mixed datatypes

2.1.5 Status

This feature is considered as Required by the WG.

2.2: Subqueries

2.2.1 Motivations

It is sometimes necessary to nest the results of a query within another query. It currently requires to get the results of a first query, parse them with dedicated scripts, and then launch the second query. The Subquery feature would allow to do such nesting in a single SPARQL query.

2.2.2 Description

In SPARQL/Query 1.0 (original SPARQL), to nest the result of a first query into another one, one has to rely on dedicated script(s) and run separate queries. For instance, to identify all the people that Alice knows and a single name for each of them, the following script should be done (in PHP, assuming the do_query function allows to run a SPARQL query and get the results as an array of PHP objects)

$query = " SELECT ?person WHERE { :Alice :knows ?person . }"; $res = do_query($query); foreach ($res as $r) { $person = $r->person->value; $query = "SELECT ?name WHERE { ?person foaf:name ?name . } LIMIT 1"; }

The Subquery feature will provide a way to nest the results of a query within another query. That feature could be used, for instance, in the following use cases:

Identifying the 10th latest blog posts created in a weblog, with a single author name for each

Retrieving a list of people with their friends, having only one friend name for each

Limit the number of distinct results retrieved based on the number of resources rather than the number of solutions.

The query form of subqueries has not yet been decided by the WG (see issues below).

2.2.3 Existing implementation(s)

The following implementations are known by the WG at the time of publication to provide a way to run subqueries:

Virtuoso supports both scalar subqueries (in all places where a variable name may occur) and subqueries as derived tables.

ARQ includes support for nested SELECTs

For instance, the following query is possible in ARQ, and is equivalent to the script mentioned before.

SELECT ?person ?name WHERE { :Alice foaf:knows ?person . { SELECT ?name WHERE { ?person foaf:name ?name } LIMIT 1 } }

2.2.4 Related discussions

Related issues raised by the WG:

2.2.5 Status

This feature is considered as Required by the WG.

2.3: Negation

2.3.1 Motivations

In SPARQL/Query 1.0 (original SPARQL), Negation by failure is possible by combining OPTIONAL , FILTER and !BOUND . It is yet difficult to write and can be a burden for learning and using SPARQL. Hence, dedicated language constructs for expressing negation are desired, as users requested, which the WG agrees with.

2.3.2 Description

Various tasks, such as data validation and social network analysis, can require the checking of whether certain triples do or don't exist in the graph. Checking the absence of triples is a form of negation, called Negation by failure (since it checks if a pattern does not match, and not if it does not exist following the open-world assumption) and is already possible in SPARQL/Query 1.0 (original SPARQL), using FILTER , OPTIONAL and !BOUND() , as follows (to retrieve the ?name of ?x for which no foaf:knows value exists, i.e. identify the name of people who do not know anyone).

PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?x foaf:givenName ?name . OPTIONAL { ?x foaf:knows ?who } . FILTER (!BOUND(?who)) }

Yet, this is not very intuitive to write and learn for users nor does it cater for efficient implementations of negation. Hence, the Negation feature to be included will provide the support for testing the absence of a match to a query pattern. Negation can be used in the following use cases:

Identify all people that do not know someone or do not have a particular skill

Identifying any content that has not been assigned a reviewer in a particular review process

Identify customers that did not buy a particular object

The feature would introduce a new operator into the algebra or a new function for filters. Any existing queries do not use these operators and are therefore unaffected.

2.3.3 Existing implementation(s)

The following implementations are known by the WG at the time of publication to support Negation by failure:

RDF::Query uses an UNSAID keyword, that was proposed during the first SPARQL WG but not addressed at that time

keyword, that was proposed during the first SPARQL WG but not addressed at that time The SeRQL query language [SeRQL] provides a MINUS operator

operator ARQ supports negation thanks to NOT EXISTS operator ( UNSAID ) being an alias for it

operator ( ) being an alias for it SQL provides a NOT EXISTS operator to identify table without a given record

The following example uses SeRQL's MINUS syntax to find the names of all people that do not know anyone, in a similar way to the previous query

SELECT x FROM {x} foaf:givenName {name} MINUS SELECT x FROM {x} foaf:givenName {name} ; foaf:knows {who} USING NAMESPACE foaf = <http://xmlns.com/foaf/0.1/>

The following example uses the UNSAID syntax:

PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?x WHERE { ?x foaf:givenName ?name UNSAID { ?x foaf:knows ?who } }

2.3.4 Related discussions

Related issues raised by the WG:

[ISSUE 29]: Should negation be done via a binary operator on subqueries, a binary operator within graph patterns, or a filter+subquery?

2.3.5 Status

This feature is considered as Required by the WG.

2.4: Project expressions

2.4.1 Motivations

Being able to return the values of expressions over result bindings, rather than just RDF terms in the store.

2.4.2 Description

In SPARQL/Query 1.0 (original SPARQL), projection queries (SELECT queries) may only project out variables bound in the query. Because variables can only be bound via triple pattern matching, there is no way to project out values that are not matched in the underlying RDF data set. Projecting expressions represents the ability for SPARQL SELECT queries to project any SPARQL expression, rather than only variables. A projected expression might be a variable, a constant URI, a constant literal, or an arbitrary expression (including function calls) on variables and constants. Functions could include both SPARQL built-in functions and extension functions supported by an implementation.

There are many use cases that motivate the ability to project expressions rather than just variables in SPARQL queries. In general, the motivation is to return values that do not occur in the graphs that comprise a query's RDF data set. Specific examples include:

Returning the total cost of an order's line item as the product of two variables: ?unit_cost * ?quantity

Use SPARQL accessors to find the languages used in a dataset: LANG(?o)

Returning computed values, such as the current day of the week: ex:dayOfTheWeek(ex:Today())

Performing simple string parsing: ex:substring(?url, 8, ex:length(?url))

TODO: More mention should be made of the connection with subqueries, as the two can be used together to answer many usecases.

2.4.3 Existing implementation(s)

The following systems are known by the WG at the time of publication to support some uses of project expressions:

Garlik's JXT (though doesn't do CONSTRUCT)

Dave Beckett's Redland-based storage engines

In ARQ with a slightly different syntax (AS is part of the () expression).

In Open Anzo's Glitter SPARQL engine, with the same syntax as ARQ

Virtuoso, syntax is (expression) AS ?alias, the clause "as ?alias" is optional so parentheses around expression are required.

XSPARQL allows XPath/XQuery functions to be used as expressions in CONSTRUCTs as in the example above.

We wish to find names, and whether the person is over 18.

SELECT ?name (?age > 18) AS over18 WHERE { ?person :name ?name ; :age ?age . }

Another example, we wish to find the full name of everyone who is interested in trees.

PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX : <http://www.example.org/> SELECT fn:string-join(?givenName, ' ', ?surname) AS ?fullName WHERE { ?person foaf:givenname ?givenName ; foaf:surname ?surname ; foaf:interest :trees . }

This example has made use of a concatenation function from XPath-Functions. Which functions will be available for value construction in SPARQL is an open issue that will be dealt with on a time-permitting basis.

To return an RDF graph where the first and family names are concatenated to a full name we can use a query similar to the SELECT example from the previous query as a subquery and use a project expression like this:

PREFIX foaf: <http://xmlns.com/foaf/0.1/> CONSTRUCT { ?x foaf:name ?fullName } WHERE { { SELECT fn:string-join(?gn, " ", ?sn) AS ?fullName WHERE { foaf:givenname ?gn ; foaf:surname ?sn . } } }

TODO: It should be established whether any implementations support both subqueries and fn:string-join()

2.4.4 Related discussions

The WG has noted that project expressions

is an important way to access the result of aggregate functions,

can be used to resolve [ISSUE 4] concerning the variable scope between main queries and subqueries.

2.4.5 Status

This feature is considered as Required by the WG.

2.5 Query language syntax

2.5.1 Motivation

Certain limitations of the SPARQL/Query 1.0 language syntax cause unnecessary barriers for learning and using SPARQL.

2.5.2 Description

Time-permitting, the SPARQL Working Group will consider extending SPARQL/Query's syntax to include:

Commas between variables and expressions within a SELECT list

IN and BETWEEN operators to abbreviate disjunction and comparisons within FILTER expressions

2.5.5 Status

This feature is considered as time-permitting only by the WG.

2.6 Property paths

2.6.1 Motivation

Many classes of query over RDF graphs require searching data structures that are hierarchical and involve arbitrary-length paths through the graphs. Examples include:

Retrieving all the elements of an RDF collection (structured as a linked list)

Retrieve all of the names of people linked to me transitively via the ex:mother and ex:father relationships (i.e. all my known ancestors)

What are all of the direct and indirect superclasses of a given owl:Class?

2.6.2 Description

SPARQL/Query 1.0 can express queries over fixed-length paths within RDF graphs. SPARQL/Query 1.0 can also express queries over arbitrary but bounded-length paths via repeated UNION constructs. SPARQL/Query 1.0 cannot express queries that require traversing hierarchical structures via unbounded, arbitrary-length paths.

Time-permitting, the SPARQL Working Group will define the syntax and semantics of property paths, a mechanism for expressing arbitrary-length paths of predicates within SPARQL triple patterns.

2.6.5 Status

This feature is considered as time-permitting only by the WG.

2.7 Commonly Used SPARQL Functions

2.7.1 Motivation

Many SPARQL implementations support functions beyond those required by the SPARQL/Query 1.0 specification. There is little to no interoperability between the names and semantics of these functions for common tasks such as string manipulation.

2.7.2 Description

Time-permitting, the SPARQL WG will define URIs and semantics for a set of functions commonly supported by existing SPARQL implementations.

See Working Group issue: ISSUE-2 - http://www.w3.org/2009/sparql/tracker/issues/2

2.7.5 Status

This feature is considered as time-permitting only by the WG.

2.8 Basic Federated Query

2.8.1 Motivation

SPARQL is a concise query language to retrieve and join information from multiple RDF graphs via a single query. In many cases, the different RDF graphs are stored behind distinct SPARQL endpoints.

2.8.2 Description

Federated query is the ability to take a query and provide solutions based on information from many different sources. It is a hard problem in its most general form and is the subject of continuing (and continuous) research. A building block is the ability to have one query be able to issue a query on another SPARQL endpoint during query execution.

Time-permitting, the SPARQL Working Group will define the syntax and semantics for handling a basic class of federated queries in which the SPARQL endpoints to use in executing portions of the query are explicitly given by the query author.

2.8.5 Status

This feature is considered as time-permitting only by the WG.

3: Service description

3.1 Motivations

Given the variety of SPARQL implementations, and differences in datasets and extension functions, a method of discovering a SPARQL endpoint's capabilities and summary information of its data in a machine-readable way is needed.

3.2 Description

Many SPARQL implementations support a variety of SPARQL extensions (many proposed here for standardization), extension functions (for use in FILTERs), and different entailment regimes. Moreover, the differences in datasets provided by SPARQL endpoints is often hard to grasp without some existing knowledge of the underlying data. This proposal suggests that these differences may be described by the endpoints themselves, detailing both (1) the capabilities of the endpoint and (2) the data contained in the endpoint.

The Service description features can be used in the following uses-cases:

Check the entailment regime supported by a SPARQL endpoint before running a query

Check if an RDF store contains instance of particular class to see if it is worth running a query over it

Check if an RDF store supports a particular extension function

3.3 Existing implementation(s)

The following services are known by the WG at the time of publication to support the Service description feature

RDF::Query provides service descriptions (based primarily on the DARQ and SADDLE vocabularies) referenced in the HTTP response headers of a query

Virtuoso has support for DBPedia VoiD data using a special default-graph of <http://dbpedia.org/stats/void#>, which can be queried using the SPARQL endpoint.

Garlik's JXT implements something similar to this feature, differences from as described are, the MIME header is "X-Endpoint-Description:" and the URI given is relative to the endpoint: /description .

The following service description is an example of what is provided when querying powered by RDF::Query using about=1 HTTP parameters, e.g. http://example.org/sparql?about=1 .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix sd: <http://darq.sf.net/dose/0.1#> . @prefix saddle: <http://www.w3.org/2005/03/saddle/#> . @prefix sparql: <http://kasei.example/2008/04/sparql#> . @prefix void: <http://rdfs.org/ns/void#> . [] a sd:Service ; rdfs:label "SPARQL Endpoint for example.org" ; sd:url <http://example.org/sparql> ; sd:totalTriples 12729 ; saddle:queryLanguage [ rdfs:label "SPARQL" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-query/> ] ; saddle:queryLanguage [ rdfs:label "RDQL" ; saddle:spec <http://www.w3.org/Submission/RDQL/> ] ; saddle:resultFormat [ rdfs:label "SPARQL Query Results XML" ; saddle:mediaType "application/sparql-results+xml" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-XMLres/ ] ; saddle:resultFormat [ rdfs:label "RDF/XML" ; saddle:mediaType "application/rdf+xml" ; saddle:spec <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ] ; saddle:resultFormat [ rdfs:label "SPARQL Query Results JSON" ; saddle:mediaType "application/sparql-results+json" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-json-res/> ] ; sparql:extensionFunction <java:com.hp.hpl.jena.query.function.library.sha1sum> ; sparql:extensionFunction <java:com.ldodds.sparql.Distance> ; sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/service> ; sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/unsaid> ; sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/federate_bindings> .

3.4 Related discussions

The serviceDescription issue was previously postponed by the DAWG.

3.5 Status

This feature is considered as Required by the WG.

The Working Group has resolved to specify a SPARQL/Update language, but may also pursue a HTTP based graph update via the protocol. This issue is orthogonal to the SPARQL/Update language. Whether or not there will be a concrete mapping between SPARQL/Update and HTTP based graph update is currently under discussion in the working group.

To change an RDF graph (either adding, updating or removing statements as well as adding statements from one graph to another or to the default graph of a triple store) one would currently have to use a programming language and one of several APIs. In other query languages, notably SQL, there are mechanisms to change the data in the database. To allow RDF graphs to be manipulated the same way and avoid using third-party APIs, a language extension is needed.

This feature is a language extension to express updates to an RDF graph or to an RDF store. As such, it uses the SPARQL in both style and detail, reduces the learning curve for developers and reduces implementation costs.

The following facilities are expected to be provided by the SPARQL/Update 1.0 language:

Insert new triples to an RDF graph.

Delete triples from an RDF graph.

Perform a group of update operations as a single action.

Create a new RDF Graph to a Graph Store.

Delete an RDF graph from a Graph Store.

The [SPARUL] Member Submission, that contains several examples, have been widely implemented and is considered a starting point for the present work.

The two following examples illustrates some of the features:

PREFIX dc: <http://purl.org/dc/elements/1.1/> INSERT DATA { <http://example/book3> dc:title "A new book" ; dc:creator "A.N.Other" . }

DELETE { ?book ?p ?v } WHERE { ?book dc:date ?date . FILTER ( ?date < "2000-01-01T00:00:00"^^xsd:dateTime ) ?book ?p ?v } }

The following systems are known by the WG to support the [SPARUL] Member Submission at the time of publication:

ARQ

Virtuoso

Related issues raised by the WG:

This feature is considered as Required by the WG.

By making it possible to update an RDF graph using RESTful HTTP methods, it becomes possible to use either a SPARQL endpoint or a plain Web server to update RDF data.

It should be possible to manipulate RDF graphs using HTTP verbs, notably PUT, POST and DELETE. By this, clients doesn't need to know the SPARQL language to update graphs when it is not needed.

The following systems are known by the WG at the time of publication to support a RESTful update protocol.

Garlik's JXT supports HTTP PUT and DELETE.

IBM's Jazz Foundation supports graph update via a RESTful protocol.

This feature is under discussion in the WG.

5 BGP extensions for entailment regimes

5.1 Motivation

Many software systems that support entailment regimes such as OWL dialects and RDF Schema extend the semantics of SPARQL Basic Graph Pattern matching to apply to entailments other than simple entailment. The formal semantics of these SPARQL/Query extensions are not standardized, and query writers cannot currently be guaranteed interoperable behavior when working with multiple query engines that extend SPARQL with the same entailment regime.

5.2 Description

SPARQL/Query 1.0 defines a mechanism to adapt SPARQL to entailment regimes beyond simple entailment by providing necessary conditions on re-defining the meaning of SPARQL Basic Graph Pattern matching. Time-permitting, the SPARQL WG will use the existing framework to define the semantics of SPARQL queries for one or more of these entailment frameworks:

OWL 2 with both Direct and RDF Based Semantics, including OWL 2 profiles

RDF Schema

Some dialects of RIF

5.5 Status

This feature is considered as time-permitting only by the WG.

6 Acknowledgments

The editors would like to thank the SPARQL Working Group for their valuable input for this document.

7 References