This article will help you to solve a classical programming exercise in Solidity and teach you a few tricks how you can save some gas while developing Solidity smart contracts

The most fun way to learn and improve your coding skills is to solve challenging exercises. The Ethereum community regularly organizes hackathons and competitions to gather smart contract developers together. Personally my favourite ones were the contests organized by Ethereum’s Nick Johnson so far.

Last year there was an amazing Underhanded Solidity Coding Contest, where the goal was to put hard-to-detect vulnerabilities in ICO contracts that could be exploited to enrich the project creators. Several fascinating solutions were submitted presenting vulnerabilities I could possibly never thought of.

This year the goal is to solve classical programming tasks in Solidity with as little gas as possible. The contest contains 5 exercises: integer sorting, creating a BrainFuck interpreter, a Hex decoder, a string search and removing duplicate elements from a list. If you want to start with the integer sorting exercise Tim Cotten already wrote a superb writeup I highly recommend. I also would like to guide you through my journey with the last exercise and show you what I’ve learnt along the way. Hopefully this will encourage you to have a look at the other exercises and get your hands dirty with some Solidity coding ;)

Setup

I’m a big fan of Truffle, so I use the Truffle project version of the contest, available here. It is pretty convenient to jump into the contest. After writing some code you can quickly test whether your solution passes all the test vectors with the command truffle test test/Unique.js .

Challenge #5: Remove duplicate elements

The problem is pretty simple: write a smart contract in Solidity which removes all but the first occurrence of each element from a list of integers, preserving the order of original elements, and returns the list. Moreover the input list may be of any length.

Naive solution

I wanted to start with a simple and dumb solution just to have something as a benchmark. A straightforward idea is the following.

Return every input array whose length is less than 2 since they are already free of duplicates. Afterwards let’s use a separate bool array to track which element is duplicate. This piece of code uses a nested loop which is highly inefficient. It gives us an O(n²) algorithm if n is the length of the input array. It may seem odd that we copy the unique[] array to a unique2[] array. The reason for this is that it is not possible in Solidity to resize arrays in memory. After testing this naive solution we see that we pass test vector 0 and 1: an empty array and a one element array… SO MUCH WIN! However for most of the test vectors it will just run out of gas. This initial failure suggests that we should aim for a O(n) algorithm. Can we find all the duplicates if we are allowed to iterate over the array only once? OFC!

Hash tables

The natural idea is not to compare the integers themselves rather a transformation of theirs. So let’s map all the elements into slots where different elements are mapped to different slots, while same elements obviously reside in the same slot. Hash tables are a perfect match for this very task. When we iterate over the array we take an integer, hash it and map the integer to a key value computed from its hash. Detecting duplicates becomes easy because if there are duplicates of an integer then we would map that integer multiple times to the same slot.

Open addressing

Even if a cryptographically secure hash function is applied, it can easily happen that two integers are assigned to the same slot. This is called collision. In our case this is clearly unwanted since two different integers are mapped to the same slot giving us to believe that they are duplicates although they are not. All in all such collisions must be accomodated some way. One of the hash collision resolution technique is open addressing.

The intuition is that whenever such a collision occurs we simply iterate in the key space as long as we find an empty slot (“key”). The integer is inserted in the first unoccupied slot. Here there is an inherent computational time/space trade-off. As the number of slots, key space, increases number of collisions decreases. Similarly if the key space gets smaller, hash collisions will be more likely. An unsophisticated and unoptimized implementation of an open addressing hash table can be found above. This passes all the test vectors and burns 1149096 gas.

How could we improve the efficiency of this solution? Let’s use a more efficient hash function. We do not need sha3 , a cryptographic hash function, for our purposes mod is also perfectly fine as it mixes numbers fairly well. So if we compute the key simply by uint hash = input[i] % lengthOfKeySpace we save 113380 gas.

Instead of the % operator we should calculate the remainder with bitwise operators seeing that bitwise operators are closer to the machine code. As one can observe in the Ethereum yellow paper one mod costs 5 gas, while a bitwise and costs only 3 gas. If the modulus is a power of 2, n mod k can be also given as n & (k-1) . Therefore let lengthOfKeySpace be 256 which is a power of 2. Calculating the modulo with bitwise and yields another 46903 gas saving.

Hmmm… 989623 gas??? This is still way too much! What if we apply a different collision resolution technique?

Separate chaining

Another common collision resolution technique is separate chaining. The idea is that each slot is independent and at each slot there is a separate list. Whenever we encounter a collision at a certain slot, we just push the to-be-inserted integer at the end of the list. Often the list is implemented as a linked list.

A visual explanation for separate chaining

The time for hash table operations is the time to find the slot, which is constant, plus the time for the list operation. In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely more than that. Therefore, structures that are efficient in time and space for these cases are preferred. As the load factor (in our case the number of integers in the input list per the number of slots) increases separate chaining becomes faster in comparison to open addressing. The reason for that is if the hash table is dense, then collisions are more likely, thus one needs to iterate more over the hash table to find an open address. Contrarily separate chaining does not use iteration to resolve collisions, it uses only one fast operation, namely an appending at the end of the slot’s list.

Separate chaining indeed proves to be more efficient than open addressing. This gives us altogether 957787 gas consumption on the truffle test vectors. On the secret test vectors of the contest our separate chaining solution consumes 792960 gas. Okay, we could omit those assignments, where we assign 0 to the hashTable or index since the EVM initializes all data structures with their default values, this is 0 in case of integer arrays. We could also make the code a little faster by using prefix increments instead of postfix increments. But all in all this does not give more than 5136 gas saving. We are not there yet! One might say this is storm in a teacup: we are still far from being on the leaderboard for this task. Currently the fastest solution only burns 309063 gas. Ouch!

Can you beat it? Challenge accepted!

Notes

As you can see we are still far from being on the leader board with this version so intentionally left some space for improvement with this one. I hope you enjoyed this ride and now you have the courage and appetite to start coding some nifty smart contracts in Solidity.

I do not want to spoiler too much but I give you some hints, how you could make your entry more efficient.

Keep in mind that only the uniquify() function’s gas cost is taken into account. I just leave this here: https://gastoken.io/ If you know what I’m saying ;) Always clean up after yourself! :)

Credit goes to Balázs Paulcsik for his ideas and support!

Keep it up!

Let me know how you solved this challenge or if you have other more efficient approaches to solve this task. I would like to hear about those ideas ;)

Keep going and happy gas golfing!

If you want to know more about my upcoming blockchain-related projects, follow me on Medium or Twitter. Strive for more, keep #buidl!