Challenge: Condition-free Repetition

“green trees between intersection road” by Shane Rounce on Unsplash

Call a function N times without using any conditions and with only Log(N) Kotlin instructions. It should produce the same result as:

for (index in 0 until N) {

doSomething(index)

}

When I started to think about loop unrolling, I started by listing restrictions for the unrolled bits (the qualities the solution must have) in order to guide us to the solution.

Restrictions:

All types of conditions (if / when / bound checks / etc.) are not allowed. No loops are allowed (for-loop / while / do-while) since these use conditions to determine when to stop. Calling any functions which utilize loops (eg. forEach) is also not allowed. The number of Kotlin instructions must be bounded by O[log(N)] since excessive repetition is error-prone (especially for larger values of N). You can only create 1 utility function to minimize maintenance overhead. Using extra memory is not allowed so that it doesn’t negate any benefits from loop unrolling.

Essentially, the puzzle requires us to repeat an action N times without using any looping mechanisms in Log(N) Kotlin instructions. Take a moment to tackle this challenge before continuing.

Observations

“grey and red coin operated binoculars in front of seawaves under nimbus clouds during daytime” by Kyle Popineau on Unsplash

This challenge is especially difficult for developers that are new to Kotlin because it requires thinking in ways that are impossible with plain Java. Lets start with some observations:

The restrictions on avoiding conditions eliminates recursion (since we need to check for the base case) and all looping constructs. The Kotlin tailrec keyword is also eliminated since it converts it to a loop anyway.

Although the restriction on the number of Kotlin instructions seems arbitrary. Why log(N)? I usually set this type of goal as it forces me to list the kind of qualities the solution must have and then I can evaluate if such a solution is even possible.

The log(N) restriction means that we need to perform some sort of divide and conquer. This seems impossible at first until we reverse our thinking. Instead of splitting a large task into smaller sub-tasks, lets instead compose sub-tasks into larger ones until we reach our desired outcome.

Step 1: Handling powers of 2

“man and woman holding hand together” by The HK Photo Company on Unsplash

With our divide and conquer mindset reversed, let's compose sub-tasks into larger ones by duplicating:

// invokes the specified body twice

inline fun duplicate(body: () -> Unit) {

body()

body()

}

And now let's use composition to repeat an action 8 times (note that Kotlin allows passing the last lambda parameter outside of the parenthesis):

duplicate {

duplicate {

duplicate {

print("Repeated 8 times!")

}

}

}

Even though we’re only calling the function 3 times, let’s read the above from the innermost outwards to make sense of it. The print-statement is duplicated resulting in 2 statements which themselves are duplicated resulting in 4 and then those 4 are duplicated resulting in 8 print-statements. If we look at the generated bytecode and decompile to Java, we can see that the inline keyword results in everything being inlined so we get:

print("Repeated 8 times!")

print("Repeated 8 times!")

print("Repeated 8 times!")

print("Repeated 8 times!")

print("Repeated 8 times!")

print("Repeated 8 times!")

print("Repeated 8 times!")

print("Repeated 8 times!")

Jumping from 8 repetitions to 16 only requires 1 additional Kotlin instruction so N instructions leads to 2^N repetitions. In other words, N repetitions can be accomplished with O[Log(N)] Kotlin instructions.

Step 2: Handling all values of N

“people having toast” by Elevate on Unsplash

Our previous solution only works when N is a power of 2. If N is a power of any other number such as 3, we could create a triplicate function which invokes the specified body 3 times and that would solve the problem for powers of 3. However, we should work towards a general solution.

Since integers can be written as a sum of powers of 2, we need to determine which powers of 2 to use. Writing the number in binary solves this as the one-bits represent the powers of 2 to be summed. Suppose we wanted to repeat an action 11 times, 11 is 1011 in binary (1 x 8 + 0 x 4 + 1 x 2 + 1 x 1). In other words, if we regard each bit as controlling a layer, we need to start with the innermost layer and add the instruction at each layer where the bit is a 1:

// Create a structure mimicking binary 1011 to repeat 11 times // Top layer outside duplicate represents the last 1 in 101*1*

print("Repeated 11 times!")

duplicate {

// This layer is the 1 in 10*1*1

print("Repeated 11 times!") // repeat 2 times

duplicate {

// This layer is the 0 in 1*0*11 so don't add the action

duplicate {

// This layer is the first 1 in *1*011

print("Repeated 11 times!") // repeat 8 more times

}

}

}

Step 3: Passing the index

“person hand holding hand figurine” by Mathias Arlund on Unsplash

The puzzle requires us to call the doSomething function passing in the index of the current iteration so we haven’t tackled this requirement. Here is the solution:

inline fun duplicate(index: Int, body: (Int) -> Int): Int {

return body(body(index))

}

And finally, lets loop from 0 to 7 inclusive (8 iterations total):

val startIndex = 0

duplicate (startIndex) {

duplicate(it) {

duplicate(it) {

print("Iteration number $it")

it + 1

}

}

}

In the above, “it” is the implicit lambda parameter which specifies the index so we just increment and pass it along.

If you thought the above code looks strange or cryptic, you would be right. This type of code should definitely not be used in production. Now that we solved the challenge, lets make use of it to perform loop unrolling.

Loop Unrolling

“person holding rolled paper near purple petaled flower and filled white ceramic mug” by rawpixel on Unsplash

Consider the following code:

for(index in 1..1000) {

doSomething(index)

}

The above calls the function 1000 times but that’s not all. The CPU checks if we reached the end of the loop prior to executing each iteration. So we’re invoking the body 1000 times and also checking the bound 1000 times but there’s more still. CPUs evaluate conditions prior to actually having the values that are compared in the condition (to avoid waiting for the values) so a branch predictor / loop predictor guesses the result of the condition. If the guess is wrong, it discards all the work after the guess was made and starts over. CPUs typically have between 10 to 20 pipeline stages so a wrong prediction costs between 10 to 20 cycles and wrong predictions can happen multiple times.

The above scenario with 2000 operations (1000 repetitions + 1000 checks) can be reduced to 1125 operations (1000 repetitions + 125 bound checks) by unrolling the loop to perform the work of 8 iterations at a time:

// Loops from [from] to [to] inclusive

inline fun loopUnrolled8(from: Int, to: Int, body: (Int) -> Unit) {

var index = from

while (index + 7 <= to) {

index = duplicate(index) {

duplicate(it) {

duplicate(it) {

body(it)

it + 1

}

}

}

} // Take care of left over amount (at most 7 iterations)

for (i in index..to) {

body(i)

}

}

Using an unroll factor of 8 is still manageable but the benefit of using Log(N) Kotlin instructions is immediately obvious when experimenting with larger unroll factors.

Finally, let's try it out:

loopUnrolled8(from = 0, to = 101) { index ->

println("loop index $index")

}

You might be wondering about using larger unroll factors (eg. 64 / 128 / etc.) to avoid even more bound checks but large factors actually decrease performance as it increases the pressure on the CPU instruction cache. Performing some micro-benchmarks with a large number of iterations and minimal iteration overhead such as adding numbers (which is a best case scenario and not representative of real workloads), I found that using 8 to 16 as the unroll factor seemed to be optimal for this trivial use-case.