Lately I’ve been doing lots of research that deals with dissection and analysis of network packets. Granted, we all know that there’s one true answer for that – and that is Wireshark. However, in my case, I seriously want to consider alternatives:

Majority of protocols I work on (beyond Ethernet, IPv4 and TCP/UDP) is binary and proprietary – obviously, Wireshark knows nothing about them. Writing your own dissectors for Wireshark is an option, but it’s not for the faint of heart. Your best bet would be coding them in Lua, but even there you’ll end up writing pretty cryptic code dealing with lots of Wireshark internals. Wireshark functionality is somewhat lacking for me. Sometimes I just want to access protocol fields programmatically from a normal, popular programming language – ideally Python, as 99% of our software is in Python. I need to process tons of traffic (think gigabytes), so I’d like it to be as fast as possible. Wireshark dissectors written in Lua are slow and, what’s even worse, very memory-hungry.

So, it boils down to the wonderful world of packet dissector frameworks for Python. Thankfully, Python’s vivid and living infrastructure offers us quite a few of them. So, let’s try them – it’s not like anyone would want to re-implement and maintain all that stuff.

Before choosing a tool for the job, I’ve decided to run a few benchmarks on them to test their raw speed.



My benchmark consists of parsing Ethernet frames (and all inner layers – IPv4, TCP, etc). For sake of simplicity and consistency, I’ll load single Ethernet frame in memory from a file, run parsing of it for zillion times and measure packet per second parsed rate. To make it fair (if some parser uses lazy parsing), I’ll access one critical field: source IPv4 address once. This way it won’t be bound by I/O, and would just measure the very raw packet processing speed.

Thus, the overall core benchmark code looks like that:

from timeit import default_timer as timer # Load sample Ethernet frame to be used for parsing with open("ethernet_frame.bin", "rb") as fh: buf = fh.read() TIMES = 10000 t1 = timer() for _ in range(TIMES): # parse Ethernet frame here # access source IPv4 address field here pass t2 = timer() pps = TIMES / (t2 - t1) print("pps = %f" % (pps))

I’ve deliberately chosen to test parser code written by framework maintainers only, as I trust them to write most optimal, best written code for particular framework than I might hope to achieve in foreseeable future.

All tests were done on the same hardware and OS, so generally it doesn’t matter what is that, but I’ll mention it anyway: it’s ThinkPad T460 laptop, sporting i5-6200U, 16 GB of RAM and running Linux Ubuntu 16.04 LTS. My production environment is close to this one, mostly consisting of Amazon’s EC2 C3-C4 large/xlarge instances running the same 16.04 LTS.

Let’s go 🙂

Scapy

Scapy is one of the oldest and well-known network packet library for Python (developed since ~2002). Its functionality stretches a bit beyond what I need: it can also create packets, send, receive and capture them over the ‘net, but I’m interested now in one particular part: packet dissection.

Installing Scapy is a breeze: pip install scapy does the trick. If you want command-line tools, you’ll need a little extra, but it my case I’m totally ok with the libraries only.

Parsing code in Scapy that we’re going to benchmark is very simple:

from scapy.all import Ether # ... pkt = Ether(buf) dummy = pkt.getlayer(1).src

Running it on my notebook yields:

2,763 pps on average in Python 2.7

3,337 pps on average in Python 3.5.1 (kudos, that’s 20% increase!)

Construct

Construct is also a well-known and mature Pythonic framework. Most people think of it as “struct-on-steroids”, and that’s partially true. However, Construct offers tons more features, like conditional parsing, repeated fields, tunneling, lazy parsing, etc, etc.

As soon as you’ll get ipstack.py, the code to benchmark is simple:

from ipstack import * # ... pkt = ip_stack.parse(buf) dummy = pkt.next.header.source

Benchmark results are kinda disappointing:

1,486 pps on average in Python 2.7

1,420 pps on average in Python 3.5.1

Python 3 performance drop is really odd. When upgrading 2 → 3 I’m mostly expecting 10…20% performance gain, but in this case it’s even a slight loss. I’ve triple-checked my benchmarking process, and re-ran it like a dozen of times, but that’s it, I’m stable at getting slightly lower results in P3.

On the upbeat, I’d like to praise Construct documentation. It’s concise, well written, and is overall a good example of what a decent documentation should look like.

Hachoir

Hachoir is a French word for meat mincer, and it was written by French Red Hat CPython engineers. It offers an ambitious introduction, and boasts a huge library of ready-made file formats and packet networks. There’s a large set of tools around Hachoir: hachoir-metadata, hachoir-uwid, hachoir-grep, hachoir-strip, etc. Also, given that it’s written by CPython engineers, I’ve expected top-notch performance.

However, the reality is turns out to be much cruel than the introduction. Documentation is, well, lacking, to put it mildly. Readthedocs.io page might seem like a huge user manual, but in reality, it’s like a dozen of paragraphs of text there, and that’s it. Most of the docs are related to command-line end-user tools, not a framework for fellow developers.

Also, Hachoir makes a hard distinction between “parser” and “fields”, so you can’t just easily call an inner layer parser inside the file format parser. Hachoir developers supply a .pcap file parser (called “hachoir.parser.network.tcpdump”), so I had to modify core benchmark to accomodate that:

stream = FileInputStream("%s/pcap_http.dat" % DATA_DIR) r = TcpdumpFile(stream) for i in range(TIMES): pkt = r['packet[%d]' % (i)] dummy = pkt['ipv4/src']

API is prety ugly and text-based. You have to use [] operator with some internal Hachoir path addressing language to address particular fields in the tree of objects. I haven’t found a way to do that without messy string construction.

Surprisingly, performance is pretty good, on par with Scapy: I get 2794 packets per second on average using Python 3. Looks like RedHat CPython engineers totally know their trade 🙂

Unfortunately (or fortunately?), Hachoir 3a doesn’t seem to work on Python 2, resulting in an Unicode error:

File "lib/python2.7/site-packages/hachoir/stream/__init__.py", line 7, in from hachoir.stream.input_helper import FileInputStream, guessStreamCharset # noqa File "lib/python2.7/site-packages/hachoir/stream/input_helper.py", line 1, in from hachoir.core.i18n import guessBytesCharset File "lib/python2.7/site-packages/hachoir/core/i18n.py", line 88, in (set("©®éêè\xE0ç".encode("ISO-8859-1")), "ISO-8859-1"), UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

Kaitai Struct

Kaitai Struct is that new kid on the block that I keep hearing of lately. That’s a mysterious new project built by some Russian and Hungarian developers that aims to be like an universal binary parsing framework for everything. Here’s my chance to test it.

From the outside, it is similar to frameworks I’ve tested so far, but it really sports a huge difference. In fact, it’s not Python framework (although it includes a bit of Python runtime code, it’s relatively small), but a distinct domain-specific language with its own compiler that gets packet specification for input and gives you a Python parser code for output. Actually, it can output tons of other languages that one might find helpful: C++, Java, JavaScript, Ruby, Perl, PHP, C#. This gives one an unique edge: one can develop parser using Kaitai Struct language and build a prototype with, say, Python, and if you’ll find yourself in need of better performance, you can switch to C++ relatively easy, retaining the same protocol specifications you’ve spent so much time developing. That’s neat!

The learning curve is somewhat steep: first of all, you need to download and install the compiler to actually do any .ksy -> .py compilation. The project is still new, so no chance that you’ll get that by simple apt-get install from Ubuntu repos, you’ll have to plug in Kaitai Struct’s own repository and install .deb from there. That, in turn, would require you to pull their auth key first. Not exactly the rocket science, so, generally, just stick to the installation instructions in their “Download” section, and you’ll be fine.

Then, one needs to download relevant .ksy and run the compiler. That’s easy:

ksc -t python pcap.ksy

Voila, you’ve got pcap.py. But then, to be able to run it, you’ll also need the Kaitai Struct runtime. That one is installed as regular Python library:

pip install kaitaistruct

Fortunately, if you’re smart, you can opt to skip all that nuissance (well, except for runtime) and just go to [https://kt.pe/kaitai_struct_webide/](Kaitai Web IDE project). That’s yet another piece of awesomeness that I’ve encountered:

It’s a web-based application (and it runs purely on a client, so no server needed), which sports a compiler and a visualizer that can apply compilation result to the binary dump you’ll put into it right away! It’s very similar to what advanced commerical hex editors like 010 Editor offer you, but:

it’s free it’s web-based it can generate parsers in language of one’s choice, i.e. Python!

Using Kaitai Struct’s generated library is also not super straightforward and reminded me slightly of Java’s IO libraries:

from ethernet_frame import * # first, we wrap our byte array into an IO stream using BytesIO io = BytesIO(buf) # next, we wrap regular pythonic IO into special KaitaiStream IO ksio = KaitaiStream(io) # finally, we run the parser and get the parser packet pkt = EthernetFrame(ksio) # accessing the field is more or less the same as with competing frameworks dummy = pkt.ipv4_body.src_ip_addr

Kaitai Struct authors declared that their product is “fast”, because “it’s a compiler”, but it totally blew my mind when I’ve seen the actual result:

32,816 pps on average in Python 2.7

31,925 pps on average in Python 3.5.1

Wow. Just wow. It’s like more than 10x improvement among the fastest competitor so far, i.e. Scapy. And it’s still 100% pure Python code, no natives, no C pieces compilation, no other tricks up the sleeve. Also note that again, Python 3 performance is slightly lower, to my disappointment (as I’m a huge proponent of moving to P3 everywhere where I can).

To be fair, I’d like to highlight a few downsides of KS:

Relatively complex multi-step installation (install ksc + run ksc + install runtime).

It’s very new product, still in version 0.x.

Documentation is not as mature as Scapy’s or excellent Construct manual (but still better than Hachoir’s).

There is no packet generation support at all (although it seems to be planned), so it’s parsing-only so far.

Conclusion

TL;DR: Kaitai Struct beats every Python packet dissection framework by at least an order of magnitude. Second place goes to good ol’ Scapy and mysterious Hachoir à la française. However, while Hachoir performance is not the worst, using it wasn’t a really pleasant experience: I’ve got tangled in poor documentation and a weird API. Construct was the slowest of them all, and, unfortunately, I can’t recommend it.

My choice is clear: Kaitai Struct is definitely a way to go.

A few ideas for future analysis: I definitely need to try compiling the same code for C++ and try Kaitai Struct’s C++ output with the same network packet .ksy to see how much improvement I’ll be able to get by switching to closer-to-the-metal language.