Quick and Dirty JSON Speed Testing in Python

[See updated post for analysis using timeit]

As per Poromenos‘ request on Reddit, I decided to do a bit of expansion on my cryptic comment about the major json packages in python (simplejson, cjson, demjson):

My conclusion: use demjson if you really really want to make sure everything is right, and you don’t care at all about time. Use simplejson if you’re in the 99% of all users who want reasonable performance over a broad range of objects, and use enhanced cjson 1.0.3x if you in the came with reasonable json inputs, and you need much faster (10x) speed…. that is, if the json step is the bottleneck.

More worrisome — demjson didn’t handle the unicode string I threw at it properly…

The Test Setup

Python 2.4.3 on 64-bit Linux

simplejson: 1.9.2, with c-extensions turned on.*

demjson: 1.3

cjson: enhanced cjson 1.0.3×6 from http://python.cx.hu/python-cjson/

(We need the enhanced version for the simplified “encode using utf-8” interface.)

* To test for simplejson c-extensions:

> assert getattr(simplejson, ‘_speedups’, None), “no speedups enabled”

Test Code

## simple json testing import simplejson import cjson import demjson class A(object): def __init__(self): self.var1 = 1 self.var2 = dict(a=1,b=2,c=3) ## TEST DATA set_ = set([1,2,3,4]) nested_dict = dict(v1="a", v2="b", v3=dict(n1=1,n2=2,n3=3)) ustring = u"a string with some unicod Andre\202" #In case anyone is wondering, unicod is a text-encoding used by Nova Scotian fishermen. class_= A() ## Dump and load methods dumps = { "simplejson": simplejson.dumps, "cjson": lambda x: cjson.encode(x,encoding="utf-8"), "demjson": demjson.encode } loads = { "simplejson": simplejson.loads, "cjson": lambda x: cjson.decode(x,encoding="utf-8"), "demjson": demjson.decode } ## Can the functions handle different data types for thing_name in ("set_", "nested_dict", "ustring", "class_", ): thing = eval(thing_name) for k,fun in dumps.iteritems(): try: out = fun(thing) print "SUCCESS: %s enocdes %s" % (k,thing_name) print out except Exception, e: print "ERROR: %s failed to enocde %s" % (k,thing_name) print "ERROR:", e ## Profiling code from profile import run for thing_name in ("nested_dict", "ustring", ): thing = eval(thing_name) for k,fun in dumps.iteritems(): print k, thing_name run("for ii in xrange(10000): fun(thing)")

Capability Results

All handled the nested dict fine, demjson was most compact, as advertised.

demjson improperly encoded the unicode string (any ideas why anyone?)

demjson encoded the set as a list, all others failed to encode it.

All failed to encode the class instance, as expected.

ERROR: cjson failed to enocde set_ ERROR: object is not JSON encodable SUCCESS: demjson enocdes set_ [1,2,3,4] ERROR: simplejson failed to enocde set_ ERROR: set([1, 2, 3, 4]) is not JSON serializable SUCCESS: cjson enocdes nested_dict {"v1": "a", "v2": "b", "v3": {"n1": 1, "n2": 2, "n3": 3}} SUCCESS: demjson enocdes nested_dict {"v1":"a","v2":"b","v3":{"n1":1,"n2":2,"n3":3}} SUCCESS: simplejson enocdes nested_dict {"v1": "a", "v2": "b", "v3": {"n1": 1, "n2": 2, "n3": 3}} SUCCESS: cjson enocdes ustring "a string with some unicod Andre\u0082" SUCCESS: demjson enocdes ustring "a string with some unicod Andre" SUCCESS: simplejson enocdes ustring "a string with some unicod Andre\u0082" ERROR: cjson failed to enocde class_ ERROR: object is not JSON encodable ERROR: demjson failed to enocde class_ ERROR: ('can not encode object into a JSON representation', <__main__.A object at 0x2aaaae9e2b90>) ERROR: simplejson failed to enocde class_ ERROR: <__main__.A object at 0x2aaaae9e2b90> is not JSON serializable

Timing Results

cjson is really really fast, especially for the nested dict. In other (unpublished) experiments, it’s even pretty close to the loading / unloading speeds of python binary load/dump using cPickle.

cjson nested_dict 10003 function calls in 0.190 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(setprofile) 10000 0.150 0.000 0.150 0.000 :3() 1 0.030 0.030 0.180 0.180 :1(?) 1 0.010 0.010 0.190 0.190 profile:0(for ii in xrange(10000): fun(thing)) 0 0.000 0.000 profile:0(profiler) demjson nested_dict 4290003 function calls (4160003 primitive calls) in 30.770 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 550000 2.060 0.000 2.060 0.000 :0(append) 10000 0.050 0.000 0.050 0.000 :0(callable) 960000 2.870 0.000 2.870 0.000 :0(chr) 60000 0.180 0.000 0.180 0.000 :0(extend) 960000 2.640 0.000 2.640 0.000 :0(has_key) 170000 0.740 0.000 0.740 0.000 :0(hasattr) 820000 2.790 0.000 2.790 0.000 :0(isinstance) 20000 0.080 0.000 0.080 0.000 :0(iterkeys) 90000 0.550 0.000 0.550 0.000 :0(join) 80000 0.230 0.000 0.230 0.000 :0(len) 140000 0.540 0.000 0.540 0.000 :0(ord) 10000 0.090 0.000 0.090 0.000 :0(range) 1 0.000 0.000 0.000 0.000 :0(setprofile) 20000 0.120 0.000 0.120 0.000 :0(sort) 1 0.130 0.130 30.770 30.770 :1(?) 30000 0.350 0.000 0.470 0.000 demjson.py:1220(encode_number) 80000 3.500 0.000 6.630 0.000 demjson.py:1378(encode_string) 10000 0.070 0.000 17.860 0.002 demjson.py:1714(encode) 130000/10000 3.210 0.000 17.750 0.002 demjson.py:1737(encode_helper) 20000/10000 2.390 0.000 16.910 0.002 demjson.py:1761(encode_composite) 10000 0.310 0.000 30.640 0.003 demjson.py:1896(encode) 20000 0.300 0.000 0.590 0.000 demjson.py:521(extend_and_flatten_list_with_sep) 80000 0.750 0.000 1.130 0.000 demjson.py:730(isstringtype) 10000 6.720 0.001 12.420 0.001 demjson.py:863(__init__) 10000 0.100 0.000 0.100 0.000 demjson.py:910(_set_strictness) 1 0.000 0.000 30.770 30.770 profile:0(for ii in xrange(10000): fun(thing)) 0 0.000 0.000 profile:0(profiler) simplejson nested_dict 1330003 function calls (950003 primitive calls) in 9.160 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 80000 0.370 0.000 0.370 0.000 :0(encode_basestring_ascii) 20000 0.100 0.000 0.100 0.000 :0(id) 270000 0.850 0.000 0.850 0.000 :0(isinstance) 20000 0.070 0.000 0.070 0.000 :0(iteritems) 10000 0.020 0.000 0.020 0.000 :0(join) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.030 0.030 9.160 9.160 :1(?) 10000 0.110 0.000 9.130 0.001 __init__.py:190(dumps) 400000/260000 3.050 0.000 6.220 0.000 encoder.py:212(_iterencode_dict) 500000/260000 3.690 0.000 8.100 0.000 encoder.py:283(_iterencode) 10000 0.860 0.000 9.020 0.001 encoder.py:345(encode) 10000 0.010 0.000 0.010 0.000 encoder.py:369(iterencode) 1 0.000 0.000 9.160 9.160 profile:0(for ii in xrange(10000): fun(thing)) 0 0.000 0.000 profile:0(profiler) cjson ustring 10003 function calls in 0.090 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(setprofile) 10000 0.030 0.000 0.030 0.000 :3() 1 0.060 0.060 0.090 0.090 :1(?) 1 0.000 0.000 0.090 0.090 profile:0(for ii in xrange(10000): fun(thing)) 0 0.000 0.000 profile:0(profiler) demjson ustring 2500003 function calls in 16.090 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 50000 0.130 0.000 0.130 0.000 :0(append) 10000 0.020 0.000 0.020 0.000 :0(callable) 960000 2.650 0.000 2.650 0.000 :0(chr) 970000 2.390 0.000 2.390 0.000 :0(has_key) 10000 0.010 0.000 0.010 0.000 :0(hasattr) 70000 0.300 0.000 0.300 0.000 :0(isinstance) 20000 0.120 0.000 0.120 0.000 :0(join) 10000 0.040 0.000 0.040 0.000 :0(len) 330000 0.980 0.000 0.980 0.000 :0(ord) 10000 0.080 0.000 0.080 0.000 :0(range) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.090 0.090 16.090 16.090 :1(?) 10000 2.060 0.000 3.490 0.000 demjson.py:1378(encode_string) 10000 0.070 0.000 3.980 0.000 demjson.py:1714(encode) 10000 0.250 0.000 3.870 0.000 demjson.py:1737(encode_helper) 10000 0.220 0.000 16.000 0.002 demjson.py:1896(encode) 10000 6.610 0.001 11.780 0.001 demjson.py:863(__init__) 10000 0.070 0.000 0.070 0.000 demjson.py:910(_set_strictness) 1 0.000 0.000 16.090 16.090 profile:0(for ii in xrange(10000): fun(thing)) 0 0.000 0.000 profile:0(profiler) simplejson ustring 50003 function calls in 0.310 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10000 0.050 0.000 0.050 0.000 :0(encode_basestring_ascii) 20000 0.070 0.000 0.070 0.000 :0(isinstance) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.040 0.040 0.310 0.310 :1(?) 10000 0.090 0.000 0.270 0.000 __init__.py:190(dumps) 10000 0.060 0.000 0.180 0.000 encoder.py:345(encode) 1 0.000 0.000 0.310 0.310 profile:0(for ii in xrange(10000): fun(thing)) 0 0.000 0.000 profile:0(profiler)

Final Conclusion

I also think simplejson is pretty rad… In the applications I use it for, json conversion is one of the bottlenecks (rather than network timing for example), and I have very well-defined / trusted input / output, so I use cjson-enhanced.