In the last decade, JSON has become the dominant data-interchange format and the backbone of most NoSQL databases. [1] So it’s not surprising that the ISO SQL standard added JSON support to their specification in late 2016 [2], and most popular SQL databases, like Postgres, one of our favorites, have upped their JSON game over the past few years. I could take a moment now to weigh in on the NoSQL vs SQL debate, and these developments definitely make that choice more complex, but I’ll save that for another time.

Suffice it to say, teams are increasingly coming to us with complex JSON objects in their structured SQL databases. This trend makes the ability to quickly and easily extract and manipulate the information you need from JSON objects more important than ever. Enter JSONPath, a query language that is similar to XPath but used to find sub-sections or specific values within JSON. JSONPath has been around since 2007 [3] and has recently seen a peak in interest after being added to the latest PostgreSQL 12. [4]

How JSONPath works

JSONPath queries use a syntax somewhat similar to property accessor notation in Javascript, with a few special symbols:

To explain these and the goal of JSONPath in more detail, let’s walk through a few examples. The simplest example of a JSONPath is the root object/element symbol: $ . On its own, this provides a result that exactly matches the input JSON:

The . and [] operators can be used to get at more specific values:

To explain some of the more powerful JSONPath features, the rest of the examples will use the following JSON:

{

"id": "5e903f11-480c-45b9-b01a-f5622f355cfc",

"customer": {

"name": "Clark Kent",

"email": "ckent@dailyplanet.org",

"address": {

"line1": "344 Clinton St.",

"line2": "Apt. #3b",

"city": "Metropolis",

"state": "KA",

"zip": "66632",

"country": "USA"

}

},

"items": [{

"name": "Widget",

"price": 4.50,

"count": 30

},

{

"name": "Book",

"price": 19.95,

"count": 2

},

{

"name": "CD",

"price": 12.99,

"count": 1

}

]

}

The * and .. operators allow us to easily pull information from the JSON without having to specify full paths to each value we care about:

We can use script expressions in combination with the current object symbol (@) to perform some more complicated tasks like finding the last item in a list:

Finally, the filter expression is one of the most powerful tools available with JSONPaths, and allows us to do things like look up objects based on the value of their properties:

If you’d like to explore the capabilities of JSONPath more fully, several tools are available on the internet for running paths against JSON data, such as JSONPath Expression Tester[5].

JSONPath at Tonic

At Tonic, we recently added the ability to use JSONPaths to manipulate specific parts of JSON found in your database. This allows you to easily replace personally identifying information, such as names, email addresses, and phone numbers, while leaving the other data and the underlying structure of the JSON intact.

Each sub-generator can be configured to target a different JSONPath and to apply a different masking technique based on the data there. A preview of the full JSON as well as the results found by your JSONPath can be viewed in the configuration window:





A quick peek at the preview of the masked data shows the rest of the data, as well as the structure of the JSON, still intact with just the email masked:





Ok JSONPath is cool, but remind me why it matters?

As more and more relational databases improve their support for semi-structured data types like JSON, standards like JSONPath are going to become an integral tool in the data engineer’s quiver. At Tonic, we’re seeing a trend where a number of our customers use JSON columns to store complex data that doesn’t naturally fit into relational tables, like tax forms or event data, and we’re excited to be providing useful ways to work with this sensitive data in lower environments.

