Source code of final iteration here

Hash maps! With O(1) average insert and query performance they’re our go-to data structure for storing values with a key.

But how do they work? Let’s build one from the ground up in Kotlin to find out. We’ll start with a very basic implementation and then incrementally improve it.

Basic put(), get() and remove() methods

First let’s create the skeleton class with empty put(), get() and remove() methods with generics K and V to indicate the type of keys and values we want to use.

class BasicHashMap<K,V> {

fun put(key: K, value: V){

// TODO put value V in the map with key K

}



fun get(key: K): V? {

// TODO get value with key K, or null if it's not present

return null

}



fun remove(key: K) {

// TODO remove value for key K

}

}

Note: We could implement the Map interface here or extend AbstractMap to get a lot of the boilerplate for free but it would overcomplicate the example. Please do take a look at the above links to see how things work in real life.

Next let’s define what we call an Entry in our map, a simple data class with a key K and value V.

data class Entry<K, V>(val key: K, val value: V)

After this we’ll need some place to store our entries so let’s use a basic array. We’ll start with creating a fixed-size array with 16 slots (same as the default Java HashMap).

We’ll worry about how we handle running out of space in later steps. This will be an array of nulls at first.

private val arraySize = 16



private val entries: Array<Entry<K,V>?> = arrayOfNulls(arraySize)

Why are we storing the key and value as an Entry object instead of just the value? We’ll get to that in the next section.

Pretty straightforward so far, now comes the magic of a hash map.

When we call put(key, value) on the hash map, the class needs to decide where in the array to store the value based on the key. To do this the map uses the hash code of the key. If you’re not familiar with the concept of hash codes then see the documentation here but, essentially, when we call the hashCode() method on the key we’re returned an integer which should be fairly unique to that object. We can use this to determine at which array index to put the value.

In our map we’re going to use a simple modulus operator which returns the remainder when one number is divided by another. e.g. if the hash code is 19 and we have 16 slots then the modulus is 3.

Now our put method looks like this:

fun put(key: K, value: V){

val index = calculateHashCode(key)

entries[index] = Entry(key, value)

}



private fun calculateHashCode(key: K): Int {

return key.hashCode() % arraySize

}

Adding the method for get and remove gives us our first (very basic) iteration of a hash map!

class BasicHashMap<K,V> {



data class Entry<K, V>(val key: K, val value: V)



private var arraySize = 16



private val entries: Array<Entry<K,V>?> = arrayOfNulls(arraySize)



fun put(key: K, value: V){

val index = calculateHashCode(key)

entries[index] = Entry(key, value)

}



fun get(key: K): V? {

val index = calculateHashCode(key)

return entries[index]?.value

} fun remove(key: K) {

val index = calculateHashCode(key)

entries[index] = null

} private fun calculateHashCode(key: K): Int {

return key.hashCode() % arraySize

}

}

You can try this out with some tests but you’ll see we soon run into a problem.

Let’s add 26 entries for the NATO phonetic alphabet:

fun testHashMap() {

val aHashMap = BasicHashMap<String, String>()





aHashMap.put("A", "Alpha")

aHashMap.put("B", "Bravo")

aHashMap.put("C", "Charlie")

aHashMap.put("D", "Delta")

aHashMap.put("E", "Echo")

aHashMap.put("F", "Foxtrot")

aHashMap.put("G", "Golf")

aHashMap.put("H", "Hotel")

aHashMap.put("I", "India")

aHashMap.put("J", "July")

aHashMap.put("K", "Kilo")

aHashMap.put("L", "Lima")

aHashMap.put("M", "Mike")

aHashMap.put("N", "November")

aHashMap.put("O", "Oscar")

aHashMap.put("P", "Papa")

aHashMap.put("Q", "Quebec")

aHashMap.put("R", "Romeo")

aHashMap.put("S", "Sierra")

aHashMap.put("T", "Tango")

aHashMap.put("U", "Uniform")

aHashMap.put("V", "Victor")

aHashMap.put("W", "Whiskey")

aHashMap.put("X", "X-Ray")

aHashMap.put("Y", "Yankee")

aHashMap.put("Z", "Zulu")



for (key in 'A'..'Z') {

println("Key = [$key] value =

[${aHashMap.get(key.toString())}]")

}

}

This produces:

Key = [A] value = [Quebec]

Key = [B] value = [Romeo]

Key = [C] value = [Sierra]

Key = [D] value = [Tango]

Key = [E] value = [Uniform]

Key = [F] value = [Victor]

Key = [G] value = [Whiskey]

Key = [H] value = [X-Ray]

Key = [I] value = [Yankee]

Key = [J] value = [Zulu]

Key = [K] value = [Kilo]

Key = [L] value = [Lima]

Key = [M] value = [Mike]

Key = [N] value = [November]

Key = [O] value = [Oscar]

Key = [P] value = [Papa]

Key = [Q] value = [Quebec]

Key = [R] value = [Romeo]

Key = [S] value = [Sierra]

Key = [T] value = [Tango]

Key = [U] value = [Uniform]

Key = [V] value = [Victor]

Key = [W] value = [Whiskey]

Key = [X] value = [X-Ray]

Key = [Y] value = [Yankee]

Key = [Z] value = [Zulu]

What happened? Well we’ve got 16 slots in our array and we tried to add 26 items so we’ve overwritten 10 of them!

These are called collisions and there are many strategies to mitigate them but here are two basic ones:

Store multiple items in each array slot

Grow the array to make more space

In a real Java HashMap both are used so let’s add them to our implementation.

Storing multiple items in each slot

For this we’ll replace our array of entries with an array of LinkedLists of entries i.e.

private val entries: Array<Entry<K, V>?> = arrayOfNulls(arraySize)

Becomes:

private var entries: Array<LinkedList<Entry<K, V>>>

= Array(arraySize) { LinkedList<Entry<K, V>>()}

We can put items in each array slot by appending them to the end of the LinkedList (or replacing an existing entry with the same key).

We can get items for key K by walking through the LinkedList to find the Entry with that key (hence why we store both key and value)

Array of LinkedLists

Now our put() method becomes a little more complicated because we need to first check if the key exists in the LinkedList before adding it but with Kotlin 😍 we can make it nice and succinct. Our get() and remove() methods are similarly idiomatic.

fun put(key: K, value: V) {

val index = calculateHashCode(key)

val listAtArraySlot = entries[index]



val newEntry = Entry(key, value)



// Check if the key already exists in the LinkedList entries

val indexOfEntryInList

= listAtArraySlot.indexOfFirst { it.key == key }



if (indexOfEntryInList >= 0) {

listAtArraySlot[indexOfEntryInList] = newEntry

} else {

listAtArraySlot.offer(newEntry)

}

}



fun get(key: K): V? {

val index = calculateHashCode(key)

val listAtArraySlot = entries[index]



return listAtArraySlot.find { it.key == key }?.value

}



fun remove(key: K) {

val index = calculateHashCode(key)

entries[index].clear()

} override fun toString(): String {

val sb = StringBuilder()

entries.forEach {

if (it.isNotEmpty())

sb.append(it).append('

')

}

return sb.toString()

}

Now when we run our alphabet test example from above we get the following:

Key = [A] value = [Alpha]

Key = [B] value = [Bravo]

Key = [C] value = [Charlie]

Key = [D] value = [Delta]

Key = [E] value = [Echo]

Key = [F] value = [Foxtrot]

Key = [G] value = [Golf]

Key = [H] value = [Hotel]

Key = [I] value = [India]

Key = [J] value = [July]

Key = [K] value = [Kilo]

Key = [L] value = [Lima]

Key = [M] value = [Mike]

Key = [N] value = [November]

Key = [O] value = [Oscar]

Key = [P] value = [Papa]

Key = [Q] value = [Quebec]

Key = [R] value = [Romeo]

Key = [S] value = [Sierra]

Key = [T] value = [Tango]

Key = [U] value = [Uniform]

Key = [V] value = [Victor]

Key = [W] value = [Whiskey]

Key = [X] value = [X-Ray]

Key = [Y] value = [Yankee]

Key = [Z] value = [Zulu]

Excellent, problem solved! Except it’s not… Let’s see how the map looks internally.

[Entry(key=P, value=Papa)]

[Entry(key=A, value=Alpha), Entry(key=Q, value=Quebec)]

[Entry(key=B, value=Bravo), Entry(key=R, value=Romeo)]

[Entry(key=C, value=Charlie), Entry(key=S, value=Sierra)]

[Entry(key=D, value=Delta), Entry(key=T, value=Tango)]

[Entry(key=E, value=Echo), Entry(key=U, value=Uniform)]

[Entry(key=F, value=Foxtrot), Entry(key=V, value=Victor)]

[Entry(key=G, value=Golf), Entry(key=W, value=Whiskey)]

[Entry(key=H, value=Hotel), Entry(key=X, value=X-Ray)]

[Entry(key=I, value=India), Entry(key=Y, value=Yankee)]

[Entry(key=J, value=July), Entry(key=Z, value=Zulu)]

[Entry(key=K, value=Kilo)]

[Entry(key=L, value=Lima)]

[Entry(key=M, value=Mike)]

[Entry(key=N, value=November)]

[Entry(key=O, value=Oscar)]

As you can see the LinkedLists are growing and will continue to grow as we add more items since we don’t have any more array slots. In fact our put(), get() and remove() methods will start exhibiting O(n) performance 😱, a far cry from the O(1) we were expecting.

Clearly we need to add more array slots but when and how?

Growing the number of available slots, when and how?

We’ve seen that we need to grow the number of slots available so we need to decide when to do it.

For this we use the concept of the “load factor”. From the official documentation: The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

A value of 0.75 is used as the default in the Java HashMap so we’ll use the same. Once we reach a total of >75% of our array slots being filled then we double the number of slots and rehash the whole table.

Here’s the code to enlarge the table and then rehash all the entries:

private fun increaseCapacity() {

// Increase size

arraySize *= 2



// Create new array and add exiting items to the bigger table

val localEntries: Array<LinkedList<Entry<K, V>>> = Array(arraySize) { LinkedList<Entry<K, V>>() }



numberOfEntries = 0



entries.forEach {

it.forEach { entry ->

put(entry.key, entry.value, localEntries)

}

}



// Make the local copy the new entry array

entries = localEntries

}

Then we update the put() to trigger this method once the threshold is reached i.e.

fun put(key: K, value: V) {

numberOfEntries++



if (numberOfEntries > arraySize * loadFactor) {

increaseCapacity()

}



put(key, value, entries)

}



private fun put(key: K, value: V, localEntries: Array<LinkedList<Entry<K, V>>>) {

// existing put code

Now when we run our test we can see the structure of the table has changed and O(1) lookup is restored!

[Entry(key=A, value=Alpha)]

[Entry(key=B, value=Bravo)]

[Entry(key=C, value=Charlie)]

[Entry(key=D, value=Delta)]

[Entry(key=E, value=Echo)]

[Entry(key=F, value=Foxtrot)]

[Entry(key=G, value=Golf)]

[Entry(key=H, value=Hotel)]

[Entry(key=I, value=India)]

[Entry(key=J, value=July)]

[Entry(key=K, value=Kilo)]

[Entry(key=L, value=Lima)]

[Entry(key=M, value=Mike)]

[Entry(key=N, value=November)]

[Entry(key=O, value=Oscar)]

[Entry(key=P, value=Papa)]

[Entry(key=Q, value=Quebec)]

[Entry(key=R, value=Romeo)]

[Entry(key=S, value=Sierra)]

[Entry(key=T, value=Tango)]

[Entry(key=U, value=Uniform)]

[Entry(key=V, value=Victor)]

[Entry(key=W, value=Whiskey)]

[Entry(key=X, value=X-Ray)]

[Entry(key=Y, value=Yankee)]

[Entry(key=Z, value=Zulu)]

Naturally this subject is much more involved than this article and there are lots of improvements and optimisations that can be added but hopefully this is a nice introduction.

Thanks for reading, questions, comments and scathing criticism welcome!