Last week at Vimeo we had a challenge. Build a better query string parameter sorter for a Varnish plugin. The gauntlet was thrown down at 5pm on a Friday and by 2am Friday night we had our first contender. I arrived home to see the code and decided to throw my hat into the ring. I had made some attempts at c plugins for php but never really got anything off the ground. The K&R book had been a fascination for me but now it was time to put it into practice.

While not in the Varnish module format yet, my still-evolving entry is skwurly.

The challenge:

Take an input const char* url:

/test.php?b=b&c=bb&a=3&a=4

/test.php?a=3&a=4&b=b&c=bb

The goal here is to standardize a url so that it may be used as a key in a hash function of an http cache, thereby preventing duplicate pages in cache for the same url with differently ordered parameters. The sorted order is optional, so long as a canonical url is derived from any ordering of query string parameters.

It’s worth noting that since this was a friendly competition, much code sharing and talking over strategies and optimizations took place over the course of the week.

Initial strategy

My initial strategy was slow and inefficient. Scan the url and count the number of ampersands so that I can allocate an array of structs of sufficient size. Scan again to grab pointers to the first character in the param and the param length and store them in the structs. Using qsort and strcmp sort the array of structs in alphabetical order. Allocate a string of duplicate size to the input. Iterate over the param struct and using pointer assignment to replace each character in sorted order.

With the exception of the return string all of the memory used is stack allocated. This attempt produced a program that could sort ~400k urls/sec on my macbook pro. Not bad, but not nearly as good as it gets.

After getting a code review from a few other c devs I set out to take a different tack. The goal was only going touching each character of the input url once. I very nearly achieve that and it’s brought the performance up to ~1.8m urls/sec on my mbp.

This is the breakdown of all the optimizations that I tried and kept/discarded. Many great devs assisted me in this process. They are listed at the bottom of the page and I am very grateful for the time they spent with me!

Memory Matters

const char*

unsigned short

struct param { const char* start; unsigned short length; } param* params[];

const char* params[]; unsigned short param_lengths[];

long

unsigned short

int

Variable Locality

const char*

char

Testing for zero

0

Pointer arithmetic

Hashing costs

// djb hash int hash = 5381; while (*url && *url != '&') hash = ((hash << 5) + hash) ^ *url++;

name=jason&job=dev&company=vimeo

strcmp vs hand rolled

strcmp

strcmp

strcmp

qsort vs insertion sort

qsort

double-wide insertion sort

Zero or one param

Optimizations that didn’t work:

Loop unrolling

send(to, from, count) register short *to, *from; register count; { register n=(count+7)/8; switch(count%8){ case 0: do{ *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; }while(--n>0); } }

I built a while -less version that optimized for 6 params or fewer where the switch cases fell through to a default of my original for loop. Initially I found a minor speed up. However it turned out to be due to an inefficiency in my loop code! Once fixed, they performed the same.

A fellow coder pointed out that loops have been pretty heavily optimized by compilers and unrolling almost never produces any benefit any more.

Single character comparison before a function call to str_compare

if ((c == 0 && str_compare(params[TAIL], p) < -1) || c < -1)

if (str_compare(params[TAIL], p) < -1)

memmove vs pointer assigment

memmove

Profiling and analysing

gprof

gcov

valgrind

gcov -b

Conclusion

At this point I am slightly behind the leader.

2.17m urls/sec 2.08m urls/sec (me) 1.83m urls/sec

Once we have a winner and we build the code into a proper Varnish module, we will open source it on Vimeo's github.

Awesome people