When looking up a string, what is the fastest strategy?

A hash map, a sorted list or an unsorted list?

Of course it depends on how many strings you have, but where are the cutoff points?

Here is a quick test, and an interesting tidbit is uncovered…



Without further ado, here is the result chart.

Time is the vertical axis (lower is better), horizontally is the number of items, note that it is logarithmic.

Those are results for random, 12 characters strings, but the curves where almost unchanged if the strings were “less random” (with 6 identical characters at the start f.i.)

“TDictionary” and “Hash FNV1a” are both hash-maps, but “Hash FNV1a” uses a fast hash, while “TDictionary” uses the slower Bob-Jenkins Hash (which is also more prone to collisions, but these did not affect that particular test)

Observations

avoid using an unsorted list 🙂

if you have a few dozen strings, a simple sorted list can be preferable

computing a hash code is hardly innocuous, with a slow hash you need to have hundreds of items just to get even on hash-code computations

Tidbit of note

Sorted & Unsorted lists are both using TStringList, with just an overloaded CompareStrings method hardcoded to a case-insensitive comparison, and the difference is just whether Sorted is set to true or not… so why the difference even for the case where you have 1 or 2 elements?

In both cases IndexOf() was called, but IndexOf merely reroutes the execution to TStrings.IndexOf or TStringList.Find depending on the value of Sorted, and that makes all the difference…

TStringList.Find accesses the FList private field directly, and so involves no reference counting, and no exception frames.

TStrings.IndexOf on the other hand goes through the Get method, which means a reference count and an implicit exception frame, which when you’re comparing a single string or two bites a lot.

When there are a couple elements, an unsorted search in TStringList is actually spent doing a fair share of Automatic Reference Counting tasks (there is also a bit of range checking, but it’s negligible in comparison).