Cpanel::JSON::XS::Type - 2018-12-03

How does Santa manage to deliver all the presents to all the children on one day? The answer is simple: He doesn't. In many countries, there are different creatures responsible for the job (if you're curious, consult the List of Christmas and winter gift bringers for details).

We live in a global world. People often work for foreign companies and sometimes move abroad. In order to satisfy even their children, the gift bringers had to start cooperating. At the beginning, they used ASN.1 to exchange the data, then switched to XML at the beginning of the current century, to enter the Twenty-tens using JSON.

Baby Jesus on Red Hat Enterprise Linux

Let's look at Baby Jesus, responsible for the central part of Europe and Latin America. The author of this article is interested in him for two reasons: Baby Jesus brings the Christmas gifts to his children, and the Baby's business runs on Perl.

Until the end of the previous year, Baby Jesus had used Red Hat Enterprise Linux 6. It featured Perl version 5.10.1, and the web service used JSON::XS version 2.27 provided by the vendor. When the Christmas 2017 was over, Kris Kringle decided to upgrade the system to RHEL7, which included Perl upgrade to 5.16.3 and JSON::XS to 3.01. Surprisingly, this wasn't smooth at all.

After the upgrade, JSON data were different. Some numbers that were previously not quoted started to appear in double quotes, while others originally quoted lost their quotes sometimes. Perl powered gift bringers didn't have to worry as Perl doesn't care, but there were Java businesses that required the format to stay stable. BJ couldn't but accept their argument that the data should roundtrip: when you POST a JSON, you expect to GET the same JSON back.

System Perl

The investigation carried out showed two causes of the problem: Perl's internal flags had changed across the versions, as well as the heuristics used by the JSON::XS module to guess whether a number should be quoted or not.

"We told you not to use the system Perl!" shouted other programmers in the office.

"Good advice," replied Baby Jesus, "but it wouldn't have helped us. We'd have probably only discovered the problem earlier when upgrading Perl or the library."

Alternatives

There are several JSON encoding/decoding modules in the wild. Each of them has its own peculiarities. Baby Jesus compared the behaviour of JSON::PP, Cpanel::JSON::XS, and JSON::XS, the latter under Perls 5.10.1 and 5.16.3.

Bah Humbug!



1:



encode_json ([ $$ , "" . $$ ])



JSON::XS in 5.10.1 returns ["213","213"] , but [213,"213"] in 5.16.3 (the behaviour of JSON::PP and Cpanel::JSON::XS).

Dag Gubmit!



1:



$x = 12 ; utf8::decode ( $x ); encode_json ([ $x ])



JSON::XS in 5.10.1 returns [12] in 5.10.1, but ["12"] in 5.16.3 (the behaviour of JSON::PP and Cpanel::JSON::XS).

Oh Chestnuts!



1:



$x = '19a' ; $x += 0 ; encode_json ([ $x ])



Here, Cpanel::JSON::XS is the one that differs; it returns [19.0] , while all the others return [19] .

Baubles!



1:



use Data::Dumper ; Dumper ( decode_json ( '[1e4]' ))



This time, it's JSON::PP who begs to differ, returning [10000] without quotes. All other libraries return the number quoted.

Gift Wrapped!



1:



$x = 1844674407370955161 ; encode_json ([ $x , $x / 10 ])



Another example of JSON::PP being not only slow but also inconsistent with the other libraries. It returns [1844674407370955161,1.84467440737096e+17] contrary to [1.84467440737096e+18,1.84467440737096e+17] .

Ah Tinsel!

Tied variables are handled differently in JSON::XS. Consider this snippet:



1:

2:

3:

4:

5:

6:

7:

8:

9:

10:

11:

12:

13:

14:

15:

16:



use warnings ;

use strict ;



{ package MyIncrementer ;

use Tie::Scalar ;

use parent -norequire => 'Tie::StdScalar' ;

sub TIESCALAR { my ( $class , $val ) = @_ ; bless \ $val , $class }

sub FETCH { my $s = shift ; $ $s ++ }

}



use JSON::XS ;

my $json = 'JSON::XS' -> new -> allow_nonref ;



tie my $x , 'MyIncrementer' , 'Xa' ;

print $json -> encode ( $x ) for 1 .. 4 ;



JSON::XS returns "Xb""Xd""Xf""Xh" regardless of the Perl version, both other libraries return "Xa""Xb""Xc""Xd" . So it seems JSON::XS calls the FETCH method twice when encoding a value to JSON.

For My Sake!

The behaviour of JSON::XS has changed across Perl versions, too.



1:

2:

3:



$j = 'JSON::XS' -> new -> allow_nonref ;

$x = 12 ;

print $j -> decode ( $x ) , $j -> encode ( $x );



In Perl 5.10.1, JSON::XS version 2.27 returns 12"12" , while 3.01 returns 1212 , which is consistent with the other libraries.

Quick Solution

"Let's just quickly fix the data before serialisation," was the initial idea of BJ's team. They wanted to call int on integers, concatenate strings to the empty string, and add floats to zero:



1:

2:

3:

4:

5:

6:

7:

8:

9:

10:

11:

12:

13:

14:

15:



use warnings ;

use strict ;



use JSON::XS ;



my $integer = "12" ;

my $string = 42 ;

my $float = "122e-1" ;



print encode_json ([

int $integer ,

"" . $string ,

0 + $float

]);



But when they started changing the code, they realised it wasn't so easy. Nested structures turned out to be hard to track, as they were usually built in steps in different parts of the code. Unintentionally inspecting a value could lead to an encoding error, and there was no easy way how to mark a value as "ready for serialisation". Moreover, the whole endeavour was confusing for non-Perl teams who needed to touch the code occasionally.

Furthermore, a formal API description already existed, so adding the same information to the code felt redundant.

Proper Solution

The final decision was to enforce the types in exactly one place, right before the serialisation. Fortunately, a company located in Christkind's territory of activity also needed to solve the same problem, and they were able to convince Reini Urban, the maintainer of Cpanel::JSON::XS, to include their solution to his distribution.

Both the methods encode and decode now took an optional argument that described the types of the encoded or decoded structure. When encoding, the programmer had to provide the types, when decoding, the second argument had to be writable and would be populated by a structure describing the types.

For example:



1:

2:

3:

4:

5:

6:

7:

8:

9:

10:

11:

12:

13:

14:

15:

16:

17:

18:

19:

20:



use warnings ;

use strict ;



use Cpanel::JSON::XS ;

use Cpanel::JSON::XS::Type ;



my $type = { count => JSON_TYPE_INT ,

average => JSON_TYPE_FLOAT ,

name => JSON_TYPE_STRING ,

is_enabled => JSON_TYPE_BOOL ,

orders => json_type_arrayof ( JSON_TYPE_INT )};



print 'Cpanel::JSON::XS' -> new -> pretty -> canonical

-> encode ({ count => '12' ,

average => '11.2' ,

name => 100 / 3 ,

is_enabled => 1 ,

orders => [ 1 .. 10 ]

} , $type );



Which returns

{ "average" : 11.2, "count" : 12, "is_enabled" : true, "name" : "33.3333333333333", "orders" : [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ] }

The module Cpanel::JSON::XS::Type exported the constants JSON_TYPE_INT , JSON_TYPE_STRING and similar, as well as functions like json_type_arrayof to declare types of nested structures.

The decoder worked in the same fashion:



1:

2:

3:

4:

5:

6:

7:

8:



use warnings ;

use strict ;



use Cpanel::JSON::XS ; use Cpanel::JSON::XS::Type ;



my $struct = 'Cpanel::JSON::XS' -> new

-> decode ( '[null,1,1.1,"1",[0],true]' , my $type );



And the variables then contained



1:

2:



$struct = [ undef , 1 , '1.1' , '1' , [ 0 ] , 1 ];

$type = [ 256 , 2 , 3 , 4 , [ 2 ] , 1 ];



where the meanings of the constants could be found in the source code of the XS file (they should be exported by an independent module so other JSON libraries could use them, too):



1:

2:

3:

4:

5:

6:

7:

8:

9:

10:





#define JSON_TYPE_SCALAR 0x0000

#define JSON_TYPE_BOOL 0x0001

#define JSON_TYPE_INT 0x0002

#define JSON_TYPE_FLOAT 0x0003

#define JSON_TYPE_STRING 0x0004



#define JSON_TYPE_CAN_BE_NULL 0x0100



#define JSON_TYPE_NULL JSON_TYPE_CAN_BE_NULL



When modelling family relations, Ježíšek used trees. The description of a tree structure is tricky, though, as it leads to a cyclic reference in its type specification (there are no cycles in a tree, but a child of a node is again a node), which causes a memory leak:



1:

2:

3:

4:



use Cpanel::JSON::XS ; use Cpanel::JSON::XS::Type ;



my $node = { value => JSON_TYPE_STRING };

$node -> { children } = json_type_arrayof ( $node );



The proper way to describe a recursive structure is to use the json_type_weaken function:



1:

2:

3:



$node -> { children } = json_type_arrayof (

json_type_weaken ( $node )

);



Similar Stories

El Niño Dios wasn't the only one to encounter the problem. See for example Did the JSON module change? on PerlMonks.

Moreover, the problem isn't particular to JSON. Whenever Perl needs to talk to a system with different type system, you might get fall into the same trap. See for example Why does DBI implicitly change integers to strings on StackOverflow.

In Perl, the internal type of a value shouldn't be important. The only exception to this were the bitwise operators, but the introduction of the bitwise feature in 5.22 fixed it, so you can now always specify the type explicitly by using the appropriate operator. Cpanel::JSON::Type follows the same philosophy.

Thanks

The Baby Jesus would like to thank Pali for implementing the features, Reini Urban for releasing them, and GoodData for supporting contribution to open source.

Notes

Based on the talk presented at The Perl Conference in Glasgow 2018 (video, slides).