Ontologies have been the topic of several posts recently, and reading through them again, I realize that someone unfamiliar with ontologies will most likely still find the subject somewhat elusive, and, for them, having a high level understanding of ontologies is enough. For those people, hopefully, it is interesting how the ontology was used to create a knowledge base that Google turned into something called a Knowledge Graph. However, for some of us, the devil is in the details, and a vague definition simply does not suffice. Wikipedia’s definition for an ontology actually raises more questions than answers; Ontology: “formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse.” To further complicate matters, one distinguishing characteristic of ontologies is the ability to present this domain knowledge in a machine readable format. What does this mean? How do a bunch of definitions and relationships get turned into something a machine can read, and then converted into knowledge?

The Semantic Web:

The“Semantic Web” refers to W3C’s vision of the Web of linked data, and ontologies are at the core of this concept. There are numerous open source and proprietary ontologies; some of the more popular being FOAF (Friend of a Friend), SUMO (Suggested Upper Merged Ontology), Cyc (oddly enough, no acronym), and DBpedia (Wikipedia infobox data), with many more that are domain specific like BPMN (Business Process Modeling Notation), WSDOM (Web Service Description Ontological Model — used by the Federal Aviation Administration), and NASA ATM (Air Traffic Management). To make these ontologies machine readable, W3C has developed a technology stack that allows you to read, query and interpret them. Rather than describe each, I have provided links for those interested in digging deeper; RDF, RDFS, SPARQL, JSON-LD, and SKOS.

In addition to the technologies listed above, another in W3C’s stack is OWL:

“Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be exploited by computer programs, e.g., to verify the consistency of that knowledge or to make implicit knowledge explicit. OWL documents, known as ontologies, can be published in the World Wide Web and may refer to or be referred from other OWL ontologies.”

This is all ok, but so what? Why do we need this, and what problem does it solve?

To put this as succinctly as possible, managing data and keeping it synchronized is always a problem. Nowhere is this more evident than on the web. The concept of the semantic web is to make the web our content management system, and its users the content managers, or editors. As you recall, I discussed the common problem of siloed data in large organizations. This problem is magnified on the web where the volume of data is growing exponentially, with data from the government, general information (e.g., how to develop ontologies), to medical and health related data, stock prices, aviation, recipes, . . . . and on, and on it goes. Applications depend on this data, and as the data changes inconsistencies arise, and information quality is compromised.

One interesting site is https://unilexicon.com/, which is a “visual online thesaurus server, classification and taxonomy management software coupled with a web-site tagging tool and record tagging system.”

Unilexicon is a tool useable by Python, and is an online tool for building vocabularies. It is an interactive site and very interesting to explore. The figure above shows the vocabulary that describes the Unilexicon; what it is, and what it is not. Tools like this help us build vocabularies related to products, software, animals, and more. As these vocabularies are developed within domains, we have the beginning of an ontology, but having data, and a vocabulary on the web alone is insufficient; we need an infrastructure for managing it.

Infrastructure of the Semantic Web:

Conceptually, the infrastructure consists of data mapped to abstract definitions, merge like representations of these mappings, and query the whole. This process is defined in detail by Ivan Herman, W3C Last update: 2012-04-09 entitled “Tutorial on the Semantic Web.” For my example, I will use Stanford University’s “wine.rdf” ontology from the Protégé Ontology Library, and the “FOAF.owl” ontologies.

Through the use of Uniform Resource Identifiers (URIs), data is linked. Data is named, then “connected” via the URI. This named connection between the data is called the Resource Description Framework (RDF) Triples. This RDF Triple consists of three components: 1) the subject, 2) property, and 3) the object; in short, the RDF Triple is (s,p,o). The “s” and “p” URI’s (actual resources on the Web), and the “o” is a URI or a literal.

from rdflib import * graph = Graph() graph.parse("file:///wine/wine.rdf", format="xml") # take subject with a known URI subject = URIRef("http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#Port") property = URIRef("http://www.w3.org/2000/01/rdf-schema#subClassOf") # process all properties and objects for this subject for (s,p,o) in graph.triples((subject,property,None)) : print(s,p,o) # I split this out for readability s = http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#Port p = http://www.w3.org/2000/01/rdf-schema#subClassOf o = http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#RedWine

Using the Python’s “rdflib” package, I imported the “wine.rdf” ontology in xml format. There are multiple machine readable formats.

Notice it returned: s = Port; p = subClassOf; o = RedWine.

In this last example, notice that you can define each element of the triple with a URIRef or Literal, and extract information about the data.

from rdflib import * FOAF = Graph() FOAF.parse("file:///FOAF/FOAF.owl", format="xml") len(FOAF) # prints 2 import pprint for ln in FOAF: pprint.pprint(ln) ###################################################### # create a graph from a file graph = Graph() graph.parse("file:///FOAF/FOAF.owl", format="xml") # take subject with a known URI subject = URIRef("http://xmlns.com/foaf/0.1/") # process all properties and objects for this subject property = URIRef("http://purl.org/dc/elements/1.1/description") obj = Literal("The last name of a person.") for (s,p,o) in graph.triples((subject,property,None)) : print(s,p,o) for (s,p,o) in graph.triples((None,None,obj)) : print(s,p,o) # I manually split these up for readability: s = http://xmlns.com/foaf/0.1/ p = http://purl.org/dc/elements/1.1/description o = The Friend of a Friend (FOAF) RDF vocabulary, described using W3C RDF Schema and the Web Ontology Language. s = http://xmlns.com/foaf/0.1/lastName p = http://www.w3.org/2000/01/rdf-schema#comment o = The last name of a person.

These are very simplistic examples, but hopefully it gives you some idea of how an ontology can be machine readable and manipulated to manage knowledge. As someone famous said, it’s only AI when you don’t know what it’s doing.