Tests

There are 5 objects (Site) to test: 1k, 2k, 4k, 8k, and 64k. This mnemonic means: 1k is the object, that is present as approximately 1 kilobyte JSON; 2k is ~2 kilobyte JSON etc.

Events are produced from these 5 objects: 1k events are events from which may be reconstructed a 1k object, etc.

Data size

Sizes (in bytes) for Site (rich DTO):

Converter 1k 2k 4k 8k 64k

JSON 1060 2076 4043 8173 65835

Boopickle 544 1130 1855 2882 16290

Protobuf 554 1175 1930 3058 27111

Thrift 712 1441 2499 4315 38289

Chill 908 1695 2507 3643 26261

Java 2207 3311 4549 6615 43168

Pickling 1628 2883 5576 11762 97997

BooPickle is the leader (and this is understandable — this library doesn’t support backward compatibility, so, they don’t need to save field tags). Chill demonstrates better results for the very big object. Thrift is not so good (maybe, it’s because of implementation for optional fields).

Sizes (in bytes) for events (sum for all events in list):

Converter 1k 2k 4k 8k 64k

JSON 1277 2499 5119 10961 109539

Boopickle 593 1220 2117 3655 42150

Protobuf 578 1192 2076 3604 42455

Thrift 700 1430 2639 4911 57029

Chill 588 1260 2397 3981 47048

Java 2716 5078 11538 26228 240267

Pickling 1565 3023 6284 13462 128797

Now protobuf looks even better.

Protobuf, thrift, chill and boopickle are almost 2.5 times more compact than JSON. Big object serializes better with Java Serialization than Pickling, and small objects — vice versa.

Data size and compression

Another interesting topic about data size is the compression. A compression is widely used in modern systems, from databases (i.e. “compress” row format in MySQL [18]) to networks (GZip over HTTP [19]). So, the data size could be not so important to look at. The comparison table for gzipped and raw object are pretty big, I will show a small part (a whole table available here [20]).

Converter site 2k events 2k site 8k events 8k

JSON (raw) 2076 2499 8173 10961

JSON (gz) 1137 2565 2677 11784

Protobuf (raw) 1175 1192 3058 3604

Protobuf (gz) 898 1463 2175 5552

Thrift (raw) 1441 1430 4315 4911

Thrift (gz) 966 1669 2256 6673

Important note: events are gzipped not together, but one by one, of course, if it will be gzipped together, result size would be much less. It shows an importance of choosing the right storage/transfer mechanism.

We may see, that on small objects (less that 2k) protobuf has almost the same size as gzipped JSON (so, we can save some CPU cycles).

Performance

The next important thing is the performance of serialization and deserialization (parsing). Our tests are about serializing/deserializing the raw data to Scala objects. In order to simplify this testing, I converted generated classes to “domain” classes, so, for protobuf and thrift there is also an addition of object conversion (I don’t think that the effect of this addition is significant).

I excluded Java Serialization and Pickling from this chart (and other charts also) because both of them are very slow. I will write about it afterward.

The code for the performance tests is in BasePerfTest.scala.

Serialization performance

On the following chart you may see the serialization times for Site object (measured in nano-seconds with System.nanoTime method).

Serialization times for Site (rich DTO), nano-seconds

ScalaPB is the pure winner, Java protobuf and Thrift goes after. BooPickle and Child are slightly slower for “small” objects, and a bit better for bigger objects.

The next chart is serialization times for events.

Serialization times for events, nano-seconds

ScalaPB still is the winner. BooPickle is very slow in this contest. Apparently, many small messages are not the proper scenario for it.

In terms of numbers [20], ScalaPB is faster than JSON more than 2 times for a single rich DTO, and more that 4 times faster than JSON for a list of small events.

Deserialization (parsing) performance

The next chart is deserialization times for Site object:

Deserialization times for Site (rich DTO), nano-seconds

ScalaPB is the winner. BooPickle looks much better, apparently, there are many optimizations during the serialization that costs a lot.

Deserialization times for events, nano-seconds

For events, the fastest library is Java Protobuf (I don’t know why, but it confirmed after several runs).

ScalaPB is ~3 times faster for rich DTO and ~3–4 times faster than JSON for a list of small events. Numbers are 2 microseconds vs. 7 microseconds for 1k Site and 3 microseconds vs. 12 microseconds for 1k events.

Java Serialization and Pickling

Pickling performance is surprisingly bad. I think I missed something and did it wrong (I used all tips from its manual), but Pickling is the slowest library in this test. But this is also not a good sign.

Just to compare its performance. Serialization of a rich DTO:

Converter 1k 2k 4k 8k 64k

JSON 4365 8437 16771 35164 270175

Serializable 13156 21203 36457 79045 652942

Pickling 53991 83601 220440 589888 4162785

Deserialization of a rich DTO:

Converter 1k 2k 4k 8k 64k

JSON 7670 12964 24804 51578 384623

Serializable 61455 84196 102870 126839 575232

Pickling 40337 63840 165109 446043 3201348

Conclusion

As expected, binary serialization is faster and produce less data. ScalaPB showed very good results (and the protobuf format in general).

Nevertheless, performance and data size are not enough to make a decision to move from using JSON to protobuf. But, it’s important to know how much does it cost.