Trie s (radix trees, not “try” plural) as the evolution of nothing

Let's say we have an array of decimal digits. There are 10 digits, and every one occurs only once.

8 4 3 5 2 0 1 9 7 6

What should we do to have it sorted?

Nothing! Nothing at all. The thing is, regardless of how the digits are placed in the unsorted array, the sorted array of unique digits is always like this:

0 1 2 3 4 5 6 7 8 9

Since we already know the answer, we don't have to do anything to compute it.

Of course, if you don't have a sorted digits array lying around, you can recreate it on top of the initial array.

8 4 3 5 2 0 1 9 7 6 0 4 3 5 2 0 1 9 7 6 0 1 3 5 2 0 1 9 7 6 0 1 2 5 2 0 1 9 7 6 0 1 2 3 2 0 1 9 7 6 0 1 2 3 4 0 1 9 7 6 0 1 2 3 4 5 1 9 7 6 0 1 2 3 4 5 6 9 7 6 0 1 2 3 4 5 6 7 7 6 0 1 2 3 4 5 6 7 8 6 0 1 2 3 4 5 6 7 8 9

Run demo

This is simple and super-effective since the complexity of the operation is only O(n) where n is the number of elements.

Now let's say some of the digits are missing.

8 3 5 2 1 9 7 6

What do we do now?

One answer could be, we just make an array of flags, traverse the initial array, and every digit will raise a flag when occurred. Then we'll traverse the flags and recreate only the elements that occurred.

8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F F F F F F F F F F 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F F F F F F F F T F 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F F F T F F F F T F 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F F F T F T F F T F 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F F T T F T F F T F 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T F F T F 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T F F T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T F T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 8 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 3 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 5 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 2 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 5 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 5 6 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 5 6 7 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 5 6 7 8 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 5 6 7 8 9 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T 1 2 3 5 6 7 8 9 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] F T T T F T T T T T

Run demo

This has two traversals with the respecting complexity of O(n) and O(k) where the k is the number of all the possible digits.

What if we have repeating elements?

8 3 5 8 2 1 1 9 7 6

Well, we can just make our flags into counters.

8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 0 0 0 0 0 0 0 0 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 0 0 0 0 0 0 0 1 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 0 0 1 0 0 0 0 1 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 0 0 1 0 1 0 0 1 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 0 0 1 0 1 0 0 2 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 0 1 1 0 1 0 0 2 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 1 1 1 0 1 0 0 2 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 0 0 2 0 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 0 0 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 0 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 8 3 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 5 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 8 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 2 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 5 1 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 5 6 1 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 5 6 7 9 7 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 5 6 7 8 8 6 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 5 6 7 8 8 9 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1 1 1 2 3 5 6 7 8 8 9 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] 0 2 1 1 0 1 1 1 2 1

Run demo

It's still rather effective, it's still O(n) and O(k). It is still the sorting algorithm with linear complexity. The problem is, normally you want to sort something other than single-digit numbers, so your k grows quite a bit.

If you want to sort 16-bit integer numbers, you would need to allocate 65 536 counters. If it's 32-bit, then it's 4 294 967 296. And it's even impossible to count the counters if you want to sort something of dynamic size, like strings or arbitrary length numbers.

And that's when we need the trees. Radix trees, also known as tries as in retrieval.

The idea here is simple. Of course, we can't just allocate trillions of counters but the chances are, we wouldn't even need most of them. So instead of laying them in a linear way, why don't we organize them as a tree? Just split a number into digits, and every digit, when being addressed, will allocate another set of digits. The number 417 then will be then denoted like this:

allocated for 41: [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 1 0 0 : allocated for 4: [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : root: [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0

We stored a 3-digit number using 30 counters. And they are shared. It would cost nothing to store another 417 or 418, and 427 would only require 10 more counters. We're not really constrained by the size of a number either, we can do as many digits as fits in our memory by attaching the trie as a subtrie to a larger trie.

To put an array in a trie, we only need n*l operations, where l is the number of digits in a number including trailing zeroes.

Sorting in this structure is fairly trivial, you just traverse the trie from lower digit to higher entering the subtrees if there is something of interest there.

Let's sort this:

418 7 1 30 7 34 30 7 417

[0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 34 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 34 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 34 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 34 417 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 34 417 418 [0][1][2][3][4][5][6][7][8][9] 0 1 0 0 0 0 0 3 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 2 0 0 0 1 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 1 1 0 : : : [0][1][2][3][4][5][6][7][8][9] : 0 0 0 0 0 0 0 0 0 0 : : [0][1][2][3][4][5][6][7][8][9] 0 0 0 0 0 0 0 0 0 0 1 7 7 7 30 30 34 417 418

Run demo

This is a digital tree also known as radix tree, and this is how radix sort works. Of course, in practice, the digits are usually binary not decimal.

It's not that effective as no sorting at all or as a single-digit radix sort. It has O(n*m) complexity where m is the length of sorted numbers in digits multiplied by the size of a radix. For binary digits, it's basically the size of a number in bits.

Standard sorting algorithms have O(n log(n)) complexity, so the competition is really between m and log(n). As we established earlier, log(4 294 967 296) is 32, so if you want to sort some 32-bit numbers, you should have your array larger than 4B to make radix sort justified. Unless your data is huge, or the numbers are really small, the standard sort will do the job better.

The same goes for the data structure itself. It also has O(m) retrieval complexity, and it's also justified on big data or shorter keys.

Radix trees are used in the most unexpected application like in IP routing or as a pointer to index array in the Linux kernel. But the most common application for them is text processing, although, to enjoy all the benefits we have to modify our structure a bit.

So far we were talking about digital trees where every digit is a node. The problem with this system is that the digits are positional, and we want out tree to be ordered. With numbers, 2 < 12, so we have to store 2 as 02 to keep the order when traversing. But this is not so with strings in alphabetical order. Letter 'a' comes before the letter 'b', and the word 'add' still precedes 'be' while being longer.

This means that instead of a digital tree, we might use a prefix tree that conceptually looks like this:

cat cats cow cower [a...s...z] : 1 : [a...t...z] : 1 : [a...r...z] : : 1 : : : [a...e...z] : : : : : [a...w...z] : : 1 : : [a...o...z] : : [a...c...z]

This tree can have elements on every node not only the leaves.

The numbers

To see how the tries perform in reality, I've made a simple benchmark: https://github.com/akalenuk/wordsandbuttons/tree/master/exp/sort/radix.

My tries are prefix tries of standard strings. I try them with different radix size from 1 to 8 bits. The larger the radix, the shorter is the lookup but also the more memory it takes to build a trie. These things should be kept in balance. As the experiments show, extra data allocations may be more harmful to performance than extra lookups.

On my machine it goes like this. Sorting a million quasi-words 3 to 4 letters long:

std::sort - 240 radix 1 sort - 221 radix 2 sort - 180 radix 4 sort - 146 radix 8 sort - 410

These are the timings measured in milliseconds. As you can see, on these data radix 8 is way less effective than radix 4, and even worse than the standard sort.

Writing and reading 100 000 quasi-words of length 2 to 8 with a trie:

Trie::Map with 1-bits radix Writing: 602 Reading: 636 Size in bytes: 35 434 704 Trie::Map with 2-bits radix Writing: 411 Reading: 404 Size in bytes: 30 357 520 Trie::Map with 4-bits radix Writing: 714 Reading: 727 Size in bytes: 54 409 656 Trie::Map with 8-bits radix Writing: 1156 Reading: 599 Size in bytes: 466 866 200 std::map Writing: 1404 Reading: 1408 std::unordered_map Writing: 479 Reading: 466

Please note, my implementation is rather crude. Normally, it is not a good idea to allocate small bits of memory with the standard new and delete operators.

My experiments are aimed to show that even with ineffective implementation tries can outperform the standard containers and algorithms on the specific data.

I encourage you to conduct your own experiments on your own data to see if you're using the best structures possible.

Conclusion

The idea behind the tries evolves from doing nothing with unordered digits to the elaborate data structures that excel in big data storage. It is still the same idea, the same principle. Knowing this principle should help you understand the tries, and add them to your arsenal of data structures and algorithms.