oq

A performant, and portable jq wrapper thats facilitates the consumption and output of formats other than JSON; using jq filters to transform the data.

Background

I've been using jq for a while for transforming a master JSON document into partner dependent structures for their consumption. However, up until recently all of the partner structures have also been in JSON. Since jq does not support outputting XML on its own, I began to look around to see if there were any libraries that would allow using jq filters to transform the data, but output XML in addition to JSON. I ended up finding a Python library called yq that seemed to be perfect.

It supports outputting to XML and JSON while being able to use the same jq filter for both. After playing around with it for a while it became clear that, while quite speedy for smaller files, it really struggled with some of the larger documents I needed to process. The fact that it's Python also complicated things as Python needs to be installed to use it, without going through some extra process to make it a singular binary. Thus, the idea for a more performant and portable option began to take shape.

Introduction

Using the relatively new Crystal language; I created oq with the primary goals being portability, performance, and to extend the formats that jq supports.

Usage

oq has three additional arguments that sets the input/output formats to use, in additional to the name of the root element if serializing to XML. All other arguments are passed on to jq .

Examples

Consuming JSON and output XML

echo '{"name": "Jim"}' | oq -o xml . <?xml version = "1.0" encoding = "UTF-8" ?> <root> <name>Jim</name> </root>

Consuming JSON and output YAML

echo '{"name": "Jim"}' | oq -o yaml . --- name: Jim

Consume YAML from a file and output XML

data.yaml



--- name : Jim numbers : - 1 - 2 - 3

oq -i yaml -o xml . data.yaml <?xml version="1.0" encoding="UTF-8"?> <root> <name> Jim </name> <numbers> 1 </numbers> <numbers> 2 </numbers> <numbers> 3 </numbers> </root>

Consume JSON, transform it, and output XML

data.json



{ "guests" : [ { "name" : "Jim" , "age" : 17 , "numbers" : [ 1 , 2 , 3 ] }, { "name" : "Bob" , "age" : 51 , "numbers" : [ 4 , 5 , 6 ] }, { "name" : "Susan" , "age" : 85 , "numbers" : [ 7 , 8 , 9 ] } ] }

filter



.guests | { "person" : [ . [] | { "age" : { "@scale" : .scale , "#text" : .age }, "name" : .name , "favorite_numbers" : { "number" : .numbers } } ] }

oq -o xml --xml-root people -f filter data.json <?xml version="1.0" encoding="UTF-8"?> <people> <person> <age scale= "months" > 289 </age> <name> Jim </name> <favorite_numbers> <number> 1 </number> <number> 2 </number> <number> 3 </number> </favorite_numbers> </person> <person> <age scale= "years" > 51 </age> <name> Bob </name> <favorite_numbers> <number> 4 </number> <number> 5 </number> <number> 6 </number> </favorite_numbers> </person> <person> <age scale= "days" > 31025 </age> <name> Susan </name> <favorite_numbers> <number> 7 </number> <number> 8 </number> <number> 9 </number> </favorite_numbers> </person> </people>

The approach on handling the JSON to XML transcoding is based on this article.

Benchmarks

I also ran some benchmarks for jq , yq , and oq to show how they compare in various situations.

Setup

OS: #1 SMP Debian 4.9.168-1+deb9u3 (2019-06-16)

CPU: Intel i7-7700k

Memory: 32GB @ 3,000 MHz

SSD: Samsung 850 PRO - 512GB

Benchmarks are done via the /usr/bin/time -v command

Simple

First, I used the data.json file to see how they perform simply parsing the file and output itself via the . filter.

jq

jq . data.json | wc -l Command being timed: "jq . data.json" User time ( seconds ) : 0.02 System time ( seconds ) : 0.01 Percent of CPU this job got: 68% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0:00.06 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 16236 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 3860 Voluntary context switches: 224 Involuntary context switches: 8 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0 31

yq

yq . spec/assets/data1.json | wc -l Command being timed: "yq . data.json" User time ( seconds ) : 0.08 System time ( seconds ) : 0.01 Percent of CPU this job got: 77% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0:00.11 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 16252 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 1 Minor ( reclaiming a frame ) page faults: 7179 Voluntary context switches: 189 Involuntary context switches: 10 Swaps: 0 File system inputs: 1672 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0 31

oq

oq . data.json | wc -l Command being timed: "oq . data.json" User time ( seconds ) : 0.02 System time ( seconds ) : 0.04 Percent of CPU this job got: 74% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0:00.10 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 16140 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 4499 Voluntary context switches: 306 Involuntary context switches: 13 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0 31

For this first test, all three are pretty much equal, with only a negligible difference in wallclock/memory used.

The next benchmark uses the jeopardy.json ~56mb file as retrieved in jq 's benchmark wiki page.

First up, a simple length jeopardy.json command.

jq

jq length jeopardy.json 216930 Command being timed: "jq length jeopardy.json" User time ( seconds ) : 0.64 System time ( seconds ) : 0.10 Percent of CPU this job got: 97% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0:00.76 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 230080 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 63213 Voluntary context switches: 240 Involuntary context switches: 13 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0

yq

yq length jeopardy.json 216930 Command being timed: "yq length jeopardy.json" User time ( seconds ) : 152.45 System time ( seconds ) : 1.27 Percent of CPU this job got: 100% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 2:33.04 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 3853532 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 1117041 Voluntary context switches: 13708 Involuntary context switches: 3189 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0

oq

oq length jeopardy.json 216930 Command being timed: "oq length jeopardy.json" User time ( seconds ) : 0.67 System time ( seconds ) : 0.17 Percent of CPU this job got: 105% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0:00.80 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 230224 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 63839 Voluntary context switches: 13832 Involuntary context switches: 12 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0

The big files do not bode well with yq , with it taking ~190x longer than either oq or jq , while also using almost 17x more memory.

YAML => XML

The last benchmark I did was giving both yq and oq a large yaml file (~57mb), then having them convert it to XML. Since jq can't consume YAML , I excluded it.

The file used: invTypes.yaml from the EVE Online SDE Export.

Example Input:



- flagID : 0 itemID : 0 locationID : 0 ownerID : 0 quantity : -1 typeID : 0 - flagID : 0 itemID : 1 locationID : 0 ownerID : 0 quantity : -1 typeID : 0 ...

yq

For yq, I had to give it a filter and some extra args for it to output correctly



yq -s -x --xml-root items --xml-dtd '{"item": .[] | .}' invItems.yaml > invItems.yq.xml Command being timed: "yq -s -x --xml-root items --xml-dtd {" item ": .[] | .} invItems.yaml" User time ( seconds ) : 309.21 System time ( seconds ) : 2.76 Percent of CPU this job got: 100% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 5:11.90 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 7817608 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 2262904 Voluntary context switches: 32918 Involuntary context switches: 2504 Swaps: 0 File system inputs: 0 File system outputs: 195072 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0

Example Output



<?xml version="1.0" encoding="utf-8"?> <items> <item> <flagID> 0 </flagID> <itemID> 0 </itemID> <locationID> 0 </locationID> <ownerID> 0 </ownerID> <quantity> -1 </quantity> <typeID> 0 </typeID> </item> <item> <flagID> 0 </flagID> <itemID> 1 </itemID> <locationID> 0 </locationID> <ownerID> 0 </ownerID> <quantity> -1 </quantity> <typeID> 0 </typeID> </item> ... </items>

oq

oq -i yaml -o xml --xml-root items . invItems.yaml > invItems.oq.xml Command being timed: "oq -i yaml -o xml --xml-root items . invItems.yaml" User time ( seconds ) : 20.08 System time ( seconds ) : 0.48 Percent of CPU this job got: 107% Elapsed ( wall clock ) time ( h:mm:ss or m:ss ) : 0:19.13 Average shared text size ( kbytes ) : 0 Average unshared data size ( kbytes ) : 0 Average stack size ( kbytes ) : 0 Average total size ( kbytes ) : 0 Maximum resident set size ( kbytes ) : 1332328 Average resident set size ( kbytes ) : 0 Major ( requiring I/O ) page faults: 0 Minor ( reclaiming a frame ) page faults: 522235 Voluntary context switches: 30478 Involuntary context switches: 974 Swaps: 0 File system inputs: 0 File system outputs: 195072 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size ( bytes ) : 4096 Exit status: 0

Example Output



<?xml version="1.0" encoding="UTF-8"?> <items> <item> <flagID> 0 </flagID> <itemID> 0 </itemID> <locationID> 0 </locationID> <ownerID> 0 </ownerID> <quantity> -1 </quantity> <typeID> 0 </typeID> </item> <item> <flagID> 0 </flagID> <itemID> 1 </itemID> <locationID> 0 </locationID> <ownerID> 0 </ownerID> <quantity> -1 </quantity> <typeID> 0 </typeID> </item> ... </items>

Similarly to the jeopary.json benchmark, yq just has a hard time dealing with the larger inputs with this test case taking ~16x longer and using almost 6x the memory than oq .

Road to 1.0.0

Since this project is still early in its development, I put together a roadmap of what I would like to get done before calling it 1.0.0 :

Support XML input format

Address bugs/issues that arise

Small feature requests

Possibly additional formats

Feel free to submit issues/PRs.