blog | oilshell.org

Questions about Superoptimization

These questions are about this series of posts:

If you have any thoughts, leave a comment. I'm using Reddit for blog comments.

What's the shortest function that implements the spec? We found one with four instructions, but didn't rule out the possibility of one with three. In general, how does the minimum solution length relate to the dimensions of the problem (the distribution of 221 values into 21 groups, and 8 bits)? When does the problem become impossible?

This is a combinatorics problem. Fifteen years ago I would have asked on a Usenet math newsgroup.

A reason not to use this technique is that the numbers will change as the code evolves, and I don't necessarily want to do a new search every time.

Can we optimize the search? I noticed symmetries in the results — i.e. there were 12 functions that had the same output distribution as (id & 230) & (id + 13) . Most search spaces for superoptimization are so big that they need to use stochastic techniques, but we could probably do some math for this specific problem to prune the search space. What if the enum labeling was constrained by two or more lookup tables? That is, how would we find two expressions to replace two lookup tables on the same set of identifiers? This will probably happen in oil, and that's another reason I won't use this solution right away. Our problem was to map 221 arbitrary IDs to 21 arbitary IDs, which is an easier problem than mapping 221 integers to 21 integers. So easy that we can solve it with single-threaded Python.

What other kinds of functions can be easily superoptimized?

How does the solution compare with perfect hashing? I believe perfect hash functions would be longer because the problem is more constrained. And some algorithms generate hashes that require lookup tables, which defeats the purpose of this optimization. (The Python code in this blog post may be useful.) Do any compilers do this? They might do other kinds of superoptimization, but this problem appears unique because allowing the enum labels to vary makes the search exponentially easier.

C and C++ specify that unlabeled enums are assigned values from 0 to N-1, but the compiler should be able to detect if the real value is actually "observed" by any code. Strongly-typed enums in C++ 11 require an explicit static_cast<int> to be used as an integer.

What other techniques are there for replacing lookup tables with functions? Several google searches yielded nothing. Finally, what's the difference between this code generated by GCC versus Clang? I wanted to verify that we're actually removing a memory access, so I compiled the code with -O3 with both compilers.

The Clang code produces a sequence of four instructions as you might expect ( xor add and and ), but GCC produces a sequence with a movzx instruction in the middle.

I suspect there's no real difference: the movzx is a register-to-register operation, and it happens at the end of the Clang code too. So we have indeed removed a memory access with our algorithm.

This function:

kind = 175 & id & (( id ^ 173 ) + 11 )

compiles to this:

GCC 0000000000400580 <_Z10LookupKind2Id>: 400580: 89 f8 mov eax,edi 400582: 81 e7 af 00 00 00 and edi,0xaf # 0xaf is 175 400588: 83 f0 ad xor eax,0xffffffad <b>40058b: 0f b6 c0 movzx eax,al</b> 40058e: 83 c0 0b add eax,0xb # 0xb is 11 400591: 21 f8 and eax,edi 400593: c3 ret 400594: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0] 40059b: 00 00 00 40059e: 66 90 xchg ax,ax ... Clang 00000000004005f0 <_Z10LookupKind2Id>: 4005f0: 40 88 f8 mov al,dil 4005f3: 34 ad xor al,0xad # 0xad is 173 4005f5: 04 0b add al,0xb # 0xb is 11 4005f7: 40 20 f8 and al,dil 4005fa: 24 af and al,0xaf # 0xaf is 175 4005fc: 0f b6 c0 movzx eax,al 4005ff: c3 ret

Versions:

blog-code/id-kind-func$ ./run.sh show-versions c++ (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4 ... clang version 3.8.0 (tags/RELEASE_380/final) Target: x86_64-unknown-linux-gnu Thread model: posix ...

Is there any other difference between these two compilations? I'm not sure what the nop and xchg are for in the GCC listing -- they both look like no-op instructions.

I'm looking forward to any insight you have into these questions.