About Us

Founded in 2005; o ffices in Washington DC & Boston

Application development & consulting

Customers in US Gov't, banking/financial, energy, health/bio, retail

Strong academic partnerships in US, UK, Europe, and Mexico

Experts in all things semantic

OWL/RDF/SPARQL/SWRL and any other acronym



Information Integration, Expertise Location, Policy Management, Enterprise Decision Support, Application Development

Overview

Use Cases of Semantic Technology in the Financial Industry

Stardog Overview



Stardog Web demo

Customer 360

Unify customer information: integrate all data about a customer as it's discovered

Past, Present, and Future



Pull from a variety of sources, including unstructured, many of which are non-relational



Take advantage of flexible nature of semantic technology

Data Provenance

Capture the provenance of data throughout its lifecycle

Utilize this information to enable data governance and regulatory compliance



Annotate data as it comes in; continuously updated

W3C spec dedicated to this: PROV

Reference Data

Create a 'gold standard' for names, labels and identities

Represent core industry terms and concepts



Ties into Data Provenance

Modeling complex relationships between entities can be trivial using semantic technology

FIBO is a great example

Compliance

Reduce compliance efforts to query answering and graph analytics

Legal regulations are complex

And tracking related policies is a time consuming job



Cost of implementation high, cost of a failure catastrophic

Utilize reasoning & rules

Express regulations and policies as complex relationships



Workflows and compliance checking can be performed by a reasoner



Automated compliance analysis with explanations

Analytics & Decision Support

Empower human decision making with contextualized, relevant information

There is a lot of value in unstructured information

But it is hard to query



And even harder to extract

But what you can extract is very valuable

valuable As you build up, you create actionable information



Sift through the data to find the facts so a human can make decisions more quickly and easily

Examples

We've built some applications for our customers based on some of these concepts

Cross Matcher





Policy Management



POPS

What's the Common Thread?

All information integration problems

i.e. not really financial services problems

financial services problems So how do you solve them?

Specifically, what's the best way to perform information integration?

way to perform information integration? Semantic Graphs



Semantic Graphs

Create graphs with meaning

Encoded within the graph



By giving formal, declarative definitions of the nodes and edges





Using a high-level language



Specifically, to create computer understandable meaning

meaning

So the computer can help

This lets us use the appropriate abstractions

And is the obvious choice for information integration problems

Some Programming Required?

It's a fact of life, non-programmers exist

You might be sitting next to one right now!



They can make valuable contributions to a codebase



Except, they can't write code





And we can't teach them





But we tend to lock everything up in the code





Like business logic

No Programming Required!

So we encode our business logic using a formal semantics

We're encoding it in the graph

the graph

Using a high-level language





No programming required





Frees it from the codebase; frees it from programmers

Non-programmers Rejoice

Let non-programmers perform complex information processing tasks without writing code

More directly capture expertise

By letting the actual experts author the business logic

Easier and more maintainable

Using the appropriate abstractions



Inference rules & queries





So the computer can do the work











Stardog

Leading RDF graph database

Pure Java

Community & Enterprise Editions

Great developer experience

Rich feature set

Currently version 2.1.3

Performance

Query

Loading lots of data is not useful if you cannot query it



Query 100M triples with a throughput of 3M+ queries per hour. 1B with nearly 500k queries/hour and 10B with nearly 20k queries/hour



This is BSBM with 64 concurrent clients



Fastest SP2B benchmark results at 5M, only known implementation to complete 25M, close to completing 100M

Scale

Up to 50B triples/quads on modest hardware

Load rates up to 500k triples/second

That's 100M triples in 3 minutes, 1B in 30, and 20B in 20 hours.

ICV

Integrity Constraint Validation keeps data safe and consistent

Prevent modifications that violate your integrity constrains

integrity constrains 'Guard mode'



Constraint violations abort transactions

Also support 'oracle' mode, aka 'middleware' mode

Outside of a transaction



Check if data valid w.r.t some constraints

Violations can be explained

Inferences can satisfy or violate a constraint

Constraints expressed in SPARQL, OWL, SWRL, or Stardog Rules

High-level declarative languages make it easy to write simple constraints, possible to write complex ones

ICV Example

Every supervisor should supervise at least one employee Supervisor subClassOf supervises some Employee IF { ?x a Supervisor } THEN { ?x supervises ?y . ?y a Employee } select * { ?x a Supervisor. FILTER NOT EXISTS { ?x supervises ?y . ?y a Employee } }

Another ICV Example

If a project is funded by only internal funding sources, then it should be approved by the internal budget office

Project and (fundedBy only InternalFundingSource) subClassOf approvedBy value InternalBudgetOffice select * where { ?x a Project . FILTER NOT EXISTS { ?x fundedBy ?y . FILTER NOT EXISTS { ?y a InternalFundingSource } } . FILTER NOT EXISTS { ?x approvedBy InternalBudgetOffice } }

ICV Explanations

If you are using ICV

You may not understand why a violation occurred



Or want to communicate it to the user

Explanations

Tells you why the violation occurred

the violation occurred

Shows exactly the data that caused the violation





Gives you the proof used to derive the violation

ICV Explanation Example Every Supervisor should supervise at least one Employee

Supervisor subClassOf supervises some Employee Alice a Supervisor VIOLATED Supervisor subClassOf (supervises some Employee) ASSERTED Alice a Supervisor NOT_INFERRED x a Employee Alice supervises x



What is reasoning?

Make implicit information explicit



Implicit in the schema, or data, or both



Represent domain knowledge in a formal declarative model



Called an ontology





Like UML, but with formal semantics





W3C specification called OWL, Web Ontology Language

Reasoners consume ontologies to derive new information

Answer queries, find inconsistencies

Complex, but manageable

OWL divided into profiles with less expressivity, but better computational properties

Reasoning

Unmatched OWL support

All OWL2 profiles (RL, EL, QL, DL) and Stardog profile (SL)



Caveats, no equality reasoning, no datatype reasoning, no DL reasoning over your ABox

Query time reasoning

No write performance penalty



Pay for what you use

Explanations

Inference you don't understand?



Reasoner will give you the proof used to derive it!

used to derive it! Reasoning Services

Consistency checking, satisfiability

Stardog Rules

Stardog supports SWRL

Part of the SL profile





You cannot write it by hand, SWRL/RDF is unusable



Much easier use Stardog Rules



If-Then style rules based on SPARQL syntax: PREFIX : PREFIX math: IF { ?c a :Circle ; :radius ?r BIND (math:pi() * math:pow(?r, 2) AS ?area) } THEN { ?c :area ?area }

Query

SPARQL 1.1

Update, query, graph protocol

Custom query planner, optimized for complex queries

Targets BI/analytic queries



And also reasoning



But does not sacrifice performance at low scales or with simple queries

Scalable query answering

Intermediate results can get big, and fast



Runtime will automatically flow results off-heap, and then to disk as needed

Query management



Full Text Search

Embeds Lucene

Automatically managed by database as if another RDF index

Enables full-text searches over your RDF



Literals are indexed by Lucene



Uses the Lucene query language

Seamless integration via SPARQL

Join results of full-text searches with regular SPARQL query

Also available via SNARL Java API

Enterprise Features

JMX server monitoring

Hot Backup & Restore



Access/Audit logging

Web console built on Stardog Web Framework

PROV and SKOS support

ACID Transactions

Rich Security model

Archetypes

Named bundle of data and functionality that can be applied when a database is created

Intended to support data standards and/or toolchains in a simple way



Mix and match these when the database is created

PROV and SKOS support are built in

FIBO is next



Can also be user defined!









Graph Versioning

Version control is insanely useful

Sometimes I wonder how people live without it



So why not for an RDF database?

Stardog adds commit management features similar to many popular VCS systems

Add metadata, like comments, to commits



Create tags



Revert to a previous version



Get diffs between versions

Oh, all of this is stored as RDF

So you can query your version history

Admin Console

In Stardog 2.0 we added the Web Console

Expose the features of the stardog CLI in an easy to use web interface





Add/Remove data, execute queries, etc.





Or simply browse your data

Coming in 2.2, we're adding an administrative web console

Create and drop database, manage security, etc.



Everything you can do via the stardog-admin CLI

Sneak Peek









The Fixer

We talked about ICV

Finding and explaining violations is nice

But what do you do with these?



How about we fix them?

Semi-automated repair plans

Use the reasoner, constraints, and a planner to find ways to fix violations



When the solution is unambiguous, it can be applied automatically



And when it's not, Stardog can present multiple plans





So human can pick which one to apply

Stardog Cluster

HA Cluster

Active Replication

2PC-based commit protocol for strong consistency



Writes processed by coordinator to determine order of operations



Reads are distributed evenly over all nodes

Coming Soon!

Closed beta starting next month



Aiming for general release in Q3

What's else?

Graph analytics

Named graph security



Stored Procedures

GeoSPARQL

Materialized views

Equality reasoning

R2RML support

And as always, faster & more scalable



Stardog Web

Focus on the Web part of Semantic Web

part of Semantic Web Organizations don't always have experts in semtech available

Provide a framework that abstracts away these details

Stick to well-known web technologies



HTML, CSS, Javascript, JSON as data





backbone.js as a model layer, SPARQL Routes middleware

as a model layer, SPARQL Routes middleware Enable teams to start building an application right away

Without focusing on learning semtech or graphs



Because the value is in solving the problem

Stardog Web

Stardog Web Console is built on this technology

Have good out of the box capabilities

Search, CRUD, REST, faceted browsing



Templates, plugin mechanisms

Requiring minimal programming

JSON based configuration



Just Add Data

Provide basics for building web applications based on semtech quickly and easily

Soon to be open source







Demo









Questions?

Transactions & Security

Transactions

ACID



Guarded (optionally) by ICV



2 Phase Commit over all database components



RDF Index, Lucene, KB, etc.





Automatically managed by the database

Security

RBAC model



Based on Apache Shiro





R/W ACLs for access to individual databases





Administrative controls for actions against DBMS





Online/offline a database, modify security settings, etc.

Graph Analytics

Coming in Stardog 2.3

RDF graphs are still just graphs

graphs Graph measures: in-degree, out-degree, PageRank, betweenness centrality

Clustering: weak/strongly connected components, clique finding

Path finding: BFS and shortest path

Seamless SPARQL integration

Graph Analytics





Reasoning Example

For example, enforcing security (ACLs)

Can Bob access Resource1? Bob is-a Admin OR Bob created Resource1 OR (Bob hasRole ?r AND ?r canAccess Resource1) OR ... Hard to maintain, encoded domain knowledge into the query

Can leverage reasoning to simplify Bob canAccess Resource1 More concise and maintainable

and Reasoner handles the implementing logic transparently

