First thing first: we accidentally sabotaged Data.Text . Its variant used tuples for the outer loop state instead of the data type with strict fields — quite a disadvantage. It now performs a bit better than String . Please accept our apologies for screaming that that sky was falling down.

I’ve posted a prettied-up slice of our benchmarking suite on my university personal page. It’s the main algorithm from the data types post for ByteString , UArray Int Char , String , Seq Char , and Data.Text .

We also include a ByteString version that has some further optimizations we came up with after the fact. Please try to best it! Enjoy!

SPECIALISE pragma vs copy-and-paste

Turns out the slight overhead of SPECIALISE we were seeing was most likely incidental and just happened to look systematic. We had decided to use the pragma in this code release, since it’s just so much cleaner that way. That gave Text a significant performance boost and that’s how we realized we had copied from the wrong variant when writing the Text variant. This is exactly why we were so concerned that the SPECIALISE pragma somehow had some overhead: copy-and-pasting is just begging for bugs! … so ashamed …

Here’s what we get when we run this code as released (with SPECIALISE pragmas) and also with manual specialization via copy-and-paste. Note that use of the SPECIALISE pragma shows no overhead.

variant average time (s) slow down factor standard deviation SNR with the SPECIALISE pragma spec 0.857 4.47 0.00238 360.83 ByteString 0.237 1.23 0.00217 109.28 UArray Int Char 0.345 1.80 0.00124 278.67 String 0.505 2.63 0.00196 258.11 Seq Char 0.841 4.38 0.00247 340.12 Text 0.487 2.54 0.00220 221.25 optimized ByteString 0.192 1.00 0.00082 234.68 with manual specialization via copy-and-paste-and-annotate spec 0.857 4.46 0.00359 238.89 ByteString 0.236 1.23 0.00122 192.89 UArray Int Char 0.345 1.80 0.00129 266.63 String 0.505 2.63 0.00184 274.63 Seq Char 0.841 4.38 0.00240 350.24 Text 0.487 2.53 0.00206 236.75 optimized ByteString 0.192 1.00 0.00107 180.06

(I just ran the SPECIALISEd version again in the background while editing this post and String slightly outperformed Text . This nondeterminism is driving me nuts.)

edit-distance

Also, check out Max Bolingbroke’s edit-distance library. That’s the only “related work” we found on hackage, and from his source code comments it looks like he’s waaay better at optimizing than we are.

That last post we promised is still in the works… thanks for your patience. Due dates, due dates, due dates!