(Sometimes our host is wrong; nanoseconds matter ;)

I have a Python Twisted server that talks to some Java servers and profiling shows spending ~30% of its runtime in the JSON encoder/decoder; its job is handling thousands of messages per second.

This talk by youtube raises interesting applicable points:

Serialization formats - no matter which one you use, they are all expensive. Measure. Don’t use pickle. Not a good choice. Found protocol buffers slow. They wrote their own BSON implementation which is 10-15 time faster than the one you can download.

You have to measure. Vitess swapped out one its protocols for an HTTP implementation. Even though it was in C it was slow. So they ripped out HTTP and did a direct socket call using python and that was 8% cheaper on global CPU. The enveloping for HTTP is really expensive.

Measurement. In Python measurement is like reading tea leaves. There’s a lot of things in Python that are counter intuitive, like the cost of grabage colleciton. Most of chunks of their apps spend their time serializing. Profiling serialization is very depending on what you are putting in. Serializing ints is very different than serializing big blobs.

Anyway, I control both the Python and Java ends of my message-passing API and can pick a different serialisation than JSON.

My messages look like:

a variable number of longs; anywhere between 1 and 10K of them

and two already-UTF8 text strings; both between 1 and 3KB

Because I am reading them from a socket, I want libraries that can cope gracefully with streams - its irritating if it doesn't tell me how much of a buffer it consumed, for example.

The other end of this stream is a Java server, of course; I don't want to pick something that is great for the Python end but moves problems to the Java end e.g. performance or torturous or flaky API.

I will obviously be doing my own profiling. I ask here in the hope you describe approaches I wouldn't think of e.g. using struct and what the fastest kind of strings/buffers are.

Some simple test code gives surprising results:

import time, random, struct, json, sys, pickle, cPickle, marshal, array def encode_json_1(*args): return json.dumps(args) def encode_json_2(longs,str1,str2): return json.dumps({"longs":longs,"str1":str1,"str2":str2}) def encode_pickle(*args): return pickle.dumps(args) def encode_cPickle(*args): return cPickle.dumps(args) def encode_marshal(*args): return marshal.dumps(args) def encode_struct_1(longs,str1,str2): return struct.pack(">iii%dq"%len(longs),len(longs),len(str1),len(str2),*longs)+str1+str2 def decode_struct_1(s): i, j, k = struct.unpack(">iii",s[:12]) assert len(s) == 3*4 + 8*i + j + k, (len(s),3*4 + 8*i + j + k) longs = struct.unpack(">%dq"%i,s[12:12+i*8]) str1 = s[12+i*8:12+i*8+j] str2 = s[12+i*8+j:] return (longs,str1,str2) struct_header_2 = struct.Struct(">iii") def encode_struct_2(longs,str1,str2): return "".join(( struct_header_2.pack(len(longs),len(str1),len(str2)), array.array("L",longs).tostring(), str1, str2)) def decode_struct_2(s): i, j, k = struct_header_2.unpack(s[:12]) assert len(s) == 3*4 + 8*i + j + k, (len(s),3*4 + 8*i + j + k) longs = array.array("L") longs.fromstring(s[12:12+i*8]) str1 = s[12+i*8:12+i*8+j] str2 = s[12+i*8+j:] return (longs,str1,str2) def encode_ujson(*args): return ujson.dumps(args) def encode_msgpack(*args): return msgpacker.pack(args) def decode_msgpack(s): msgunpacker.feed(s) return msgunpacker.unpack() def encode_bson(longs,str1,str2): return bson.dumps({"longs":longs,"str1":str1,"str2":str2}) def from_dict(d): return [d["longs"],d["str1"],d["str2"]] tests = [ #(encode,decode,massage_for_check) (encode_struct_1,decode_struct_1,None), (encode_struct_2,decode_struct_2,None), (encode_json_1,json.loads,None), (encode_json_2,json.loads,from_dict), (encode_pickle,pickle.loads,None), (encode_cPickle,cPickle.loads,None), (encode_marshal,marshal.loads,None)] try: import ujson tests.append((encode_ujson,ujson.loads,None)) except ImportError: print "no ujson support installed" try: import msgpack msgpacker = msgpack.Packer() msgunpacker = msgpack.Unpacker() tests.append((encode_msgpack,decode_msgpack,None)) except ImportError: print "no msgpack support installed" try: import bson tests.append((encode_bson,bson.loads,from_dict)) except ImportError: print "no BSON support installed" longs = [i for i in xrange(10000)] str1 = "1"*5000 str2 = "2"*5000 random.seed(1) encode_data = [[ longs[:random.randint(2,len(longs))], str1[:random.randint(2,len(str1))], str2[:random.randint(2,len(str2))]] for i in xrange(1000)] for encoder,decoder,massage_before_check in tests: # do the encoding start = time.time() encoded = [encoder(i,j,k) for i,j,k in encode_data] encoding = time.time() print encoder.__name__, "encoding took %0.4f,"%(encoding-start), sys.stdout.flush() # do the decoding decoded = [decoder(e) for e in encoded] decoding = time.time() print "decoding %0.4f"%(decoding-encoding) sys.stdout.flush() # check it if massage_before_check: decoded = [massage_before_check(d) for d in decoded] for i,((longs_a,str1_a,str2_a),(longs_b,str1_b,str2_b)) in enumerate(zip(encode_data,decoded)): assert longs_a == list(longs_b), (i,longs_a,longs_b) assert str1_a == str1_b, (i,str1_a,str1_b) assert str2_a == str2_b, (i,str2_a,str2_b)

gives:

encode_struct_1 encoding took 0.4486, decoding 0.3313 encode_struct_2 encoding took 0.3202, decoding 0.1082 encode_json_1 encoding took 0.6333, decoding 0.6718 encode_json_2 encoding took 0.5740, decoding 0.8362 encode_pickle encoding took 8.1587, decoding 9.5980 encode_cPickle encoding took 1.1246, decoding 1.4436 encode_marshal encoding took 0.1144, decoding 0.3541 encode_ujson encoding took 0.2768, decoding 0.4773 encode_msgpack encoding took 0.1386, decoding 0.2374 encode_bson encoding took 55.5861, decoding 29.3953

bson, msgpack and ujson all installed via easy_install

I would love to be shown I'm doing it wrong; that I should be using cStringIO interfaces or however else you speed it all up!

There must be a way to serialise this data that is an order of magnitude faster surely?