Now that we’ve published nearly 10,000 of our tags as Linked Open Data, you’re probably wondering what kind of cool applications you can build with this data. To help you get started (and since linked data applications are a little different from your average Web application), we thought we’d provide a sample application and detailed information about how we built it.

Our sample application, “Who Went Where,” lets you explore recent Times coverage of the alumni of a specified college or university.

You can find the application here and beautified source code here.

Before we dive into the source, let’s take a high-level look at the application’s control (which is fairly straightforward).

The application starts by initializing an auto-complete field with the names of all the colleges and universities in DBpedia. When the user selects the name of an institution from the auto-complete field, the application queries DBpedia for the NYT identifiers of all the alumni of that institution. These identifiers are then used to query the New York Times Article Search API for the ten most recent articles about each alumnus. Then we use a little jQuery magic to display and format these articles.

Wait! Linked data? DBpedia? Perhaps some definitions are in order.

Linked Data:

The idea behind linked data is super simple. Databases power nearly every site on the Web. And even though these sites can link to each other’s pages, their databases remain almost entirely ignorant of one another. Linked data uses W3C Semantic Web Standards to cure sites of their mutual data-based ignorance by making it easy to specify the relationships between previously isolated data silos. On a more technical level, linked data provides a mechanism for representing databases (called RDF) and a mechanism for querying those databases (called SPARQL). For a great introduction to RDF and SPARQL, check out SPARQL By Example.

DBpedia:

Have you ever noticed those handy little info boxes on certain Wikipedia articles? Well, it turns out that those boxes contains lots of useful information, ranging from people’s birthdays to the heights of mountains. And since the info boxes represent this information in a reasonably structured format, it’s possible to build a database out of these humble little boxes. And that’s exactly what teams from The Free University of Berlin and The University of Leipzig decided to do. Their database is called DBpedia. Because it’s both really useful and constructed from Semantic Web building blocks, DBpedia has become one of the central hubs in the evolving Linked Data Cloud. Better yet, DBpedia provides a mechanism for handling SPARQL queries (known as a SPARQL end point in linked-data-geek-speak), so it’s easy to build Web applications on top of DBpedia.

Step-by-Step to Your Own NYT Linked Data Application

Now that we have the definitions out of the way, let’s move on to the guts of our linked data application. For the sake of clarity, this description focuses on the highlights of the code. You’ll probably find it helpful to follow along in the application’s source code while reading the rest of this post.

Step 1: Initializing the Auto-Complete Field

Our linked data application is built with jQuery, and like all good jQuery applications, it starts with a $(document).ready() function:

$ ( document ) . ready ( function ( ) { setupAutoComplete ( ) ; } ) ;

The first thing the setupAutoComplete() function does is figure out how many colleges and universities there are, by querying DBpedia’s SPARQL end point (provided by Open Link Software) with the following query.

SELECT COUNT ( ?uri ) AS ? COUNT WHERE { ?uri rdf: TYPE dbpedia - owl:University . ?uri foaf:name ?name . }

Results of the preceding query can be found in HTML here and in JSON here.

Once we have a count of the number of colleges and universities known to DBpedia, the code moves on to the loadAutoCompleteData() function. This repeatedly calls the loadDBPediaUniversities() function until we’ve loaded every last one of DBpedia’s university and college identifiers into a dictionary keyed by their names. The SPARQL query used to get this information is as follows:

SELECT ?uri , ?name WHERE { ?uri rdf: TYPE dbpedia - owl:University . ?uri foaf:name ?name } LIMIT 1000 OFFSET < em > offset </ em >

The offset is used to access records beyond the first 1,000, as DBpedia’s SPARQL end point is limited to 1,000 rows per query.

Results of the preceding query can be found in HTML here and in JSON here.

Once we have created our array of DBpedia names, _autoCompleteArray , and our dictionary of DBpedia identifiers keyed by their names, _autoCompleteMap , the auto-complete is initialized as follows.

$ ( "#school_input" ) . autocomplete ( _autoCompleteArray , { width : 235 , selectFirst : true , scroll : true , matchContains : true , matchCase : false , } ) ; $ ( "#school_input" ) . result ( function ( event , data , formatted ) { loadAlumniForSchool ( _autoCompleteMap [ data [ 0 ] ] , data [ 0 ] ) ; } ) ;

There’s a lot going on in that code fragment, but the important thing to see is that whenever an item is selected from the auto-complete field, the loadAlumniForSchool() function is invoked with the name and identifier of the specified school or university. This brings us to the next step.

Step 2: Displaying the Alumni of a Specified School or University

Let’s assume that our user searches for “Williams College,” causing the loadAlumniForSchool() function to be invoked with the parameters //dbpedia.org/resource/Williams_College and Williams College . The application must now find all the alumni of Williams College that DBpedia knows about. To do this, the application makes the following query (simplified here for clarity):

SELECT * WHERE { ?alumnus dbpprop:almaMater <http: // dbpedia . org / resource / Williams_College> . ?alumnus owl:sameAs ?nytId . ?alumnus dbpprop:name ?name . OPTIONAL { ?alumnus dbpedia - owl:birthDate ?birthDate } . OPTIONAL { ?alumnus dbpedia - owl:deathDate ?deathDate } . OPTIONAL { ?alumnus owl:sameAs ?freebaseUri . FILTER regex ( ?freebaseUri , '//rdf \\ .freebase \\ .com/.*' ) . FILTER regex ( ?nytId , '//data \\ .nytimes \\ .com/.*' ) . }

Results of the preceding query can be found in HTML here and in JSON here.

In plain English, the preceding query might sound something like this:

Find an alumnus (?alumnus) who attended (dbpprop:almaMater) Williams College; the name (dbpprop:name) of that alumnus (?name); the New York Times identifier (?nytdId) corresponding (owl:sameAs) to the alumnus. Optionally find the birth data, death date, and Freebase identifier (?birthDate, ?deathDate, and ?freebaseUri respectively) for the alumnus.

In response to this query, DBpedia gives us back a big ol’ JSON object containing all the alumni info we asked for. Contained within this object are the New York Times identifiers for each alumnus. In the case of Williams College, these identifiers are:

//data.nytimes.com/38832438934068808203 //data.nytimes.com/N77681930120720110803 //data.nytimes.com/73439517545719071583

Now we transform these New York Times identifiers into a list of articles about each identifier. Doing this is a two-step process. First, for each identifier, we obtain a JSON representation of the resource corresponding to it. For example, given the identifier //data.nytimes.com/38832438934068808203 , we execute the following AJAX request:

$. ajax ( { dataType : 'jsonp' , jsonp : 'callback' , url : http : //data.nytimes.com/38832438934068808203<strong>.json</strong>, success : function ( json ) { if ( json. stat == 'ok' ) { loadAlumnusDetails ( json ) ; } } } ) ;

Results of this request can be found here.

The JSON object returned by this AJAX request contains the Times Tag that corresponds to the New York Times identifier. In the case of the identifier //data.nytimes.com/38832438934068808203 , this tag is “Bennett, William J” and can be accessed from the JSON object as follows:

var timesTag = nytJson [ "//data.nytimes.com/38832438934068808203" ] [ "skos:prefLabel" ] ;

Now that we have the Times Tag corresponding to the New York Times Identifier, we use the New York Times Article Search API to obtain a list of articles matching the tag. Such a query can be constructed as follows:

//api.nytimes.com/svc/search/v1/article ?? query=+nytd_per_facet[<strong>TimesTag</strong>] &rank=newest &fields=abstract, body, byline, date, small_image_url, title, url &api-key=<strong>xxxxxxxxxxxxxxxxxxx</strong>

Please note that the New York Times Article Search API does not support client-side requests, so this request must be made on the server side.

Now, at long last, our application formats and displays our hard-won list of articles about the alumni of the specified school.



A Note About Sources and Data Quality

The images in this application are provided by the Freebase Image Thumbnailing API. Many thanks to the great team at Freebase for this service. Also, a general disclaimer: given that our sample application is built upon community-generated data from third-party sources, there may be some errors in the data. The good news is that both Freebase and DBpedia allow you to fix such errors, so if you see something that needs fixing, you know what to do.

That’s It?

So there you have it — all it takes to build a simple linked data application with New York Times Linked Open Data. But remember: this post just focuses on the highlights. We encourage you to take a closer look at the code and dig into some of the more advanced features we didn’t discuss. We hope that you share our excitement about the possibilities of linked data, and we look forward to seeing what you create!