factorial



factorial :: Int -> Int

factorial n = if n > 0 then n * factorial (n-1) else 1



factorial

n

Int



data Int = Int# I#



I#

Int#

factorial

n

n

Int#

I#

Int

Int#

>#



factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1



factorial



factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (n# -# 1) else 1



factorial



factorial :: Int -> Int

factorial n = factorial# n#



n#

n

factorial

factorial

factorial#



factorial :: Int -> Int

factorial n = if n > 0 then n * factorial (n-1) else 1



factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1



factorial#

factorial#

factorial

factorial#

factorial

n

n

(Int# n#)

n



factorial n = if n > 0 then Int# n# * factorial (Int# n# - 1) else 1



(-)



factorial n = if n > 0 then Int# n# * factorial (Int# (n# -# 1)) else 1



factorial# n# = if n# ># 0 then n# *# factorial (Int# (n# -# 1)) else 1





factorial n = if n > 0 then n# *# factorial# (n# -# 1) else 1



factorial# n# = if n# ># 0 then n# *# factorial# (n# -# 1) else 1



*

factorial

*#



strict :: Int -> Int

strict x = x `seq` lazy x (x-1) (x+1)



lazy :: Int -> Int -> Int -> Int

lazy a b c = if a == 0 then b else c



lazy

a

b

c

lazy

x-1

x+1

lazy

lazy

b

c

lazy



module Temp where



strict :: Int -> Int

strict x = x `seq` lazy x (x+1) (x-1)



lazy :: Int -> Int -> Int -> Int

lazy a b c = if a == 0 then lazy b b b else c



ghc Temp.hs -c -O2 -ddump-simpl

lazy

lazy :: Int# -> Int -> Int -> Int



What is the overlap between strict and speculative?



Can both variants be combined? (almost certainly yes)



Is speculative really simpler?



Is speculative sufficient?



What are the performance benefits of speculative?



For the last few days I have been thinking about how to write a low-level program optimiser, based on the ideas from Supero . Supero works at the level of a lazy Core expressions, but actual hardware works on a sequence of strict instructions. The possible idea is to translate the lazy expressions to strict sequences, then borrow the ideas from supercompilation once more. In particular I have been looking at the GRIN approach, which defines such a set of instructions.The GRIN work is very clever, and has many ideas that I would like to reuse. However, the one aspect that gave me slight concern is the complexity. A GRIN program requires the use of several analysis passes, and many many transformation rules. While this approach is perfectly acceptable, one of the goals of the Supero work is to make the optimisation process simpler -- comprising of a few simple but powerful rules.I will first explain how strictness works, then how my speculative approach works. Readers who already know about unboxing are encouraged to skip to the speculative section.When doing low-level compilation, one of the most important stages is strictness analysis, and the associated unboxing. To take the example of thefunction in Haskell:Here it is easy to see that thefunction always evaluates. We can also use our knowledge of the definition ofWhereis an actual machine integer (possibly stored in a register), andis a lazy box surrounding it. Since we know thatwill always unwrap our, we can pass thearound without thebox. I have made all the conversions fromtoexplicit using an, but have left all the unboxings implicit. The operatorsetc. are simply unboxed and strict variants of the standard operators.Also, since we knowis strict in its first argument, we can evaluate the first argument to the recursive call strictly. Applying all these optimisations can now write:We have removed the explicit boxing in the recursive call, and work entirely with unboxed integers. Nowis entirely strict. We can even write a wrapper around our strict version, to provide a lazy interface matching the original.I have usedto denote the unboxing of. Nowlooks like it did before, but operates much faster, on unboxed integers.I would like to not include a strictness analyser in my optimiser, or if it is included, have it be the result of a series of transformations -- without explicit "stop and analyse" then "use the results" stages. As part of my thoughts on this, I was trying to consider how to optimisewithout invoking the strictness analyser.The speculative transformation I have defined first generates- I have left out the details ofit decides to.This step is entirely safe - we have defined, but we have not written a wrapper that invokes it, even in the recursive case. Thefunction is equivalent toif the initial argument was evaluated. We have transformedusing only local knowledge, at the point. We can also transform, replacing any uses ofwhich are guaranteed to come afteris evaluated, with. This transformation is merely reusing the knowledge we have gained unwrappingNow we promote any primitive operations on only unboxed values. Given, it is cheaper to evalute the subtraction than to store a lazy thunk to the function.We can now use our knowledge that if we know an argument to a function is already evaluated, we can call the strict variant (this corresponds closely to constructor specialisation ):We can also replace theinwithas we know we will have to evaluate the result of a function. Now we have ended up with a fast inner loop, operating only on unboxed integers. We have not required strictness information to make any transformation.One way of viewing the difference between strictness and this transformation is the flow of information. In strictness, the caller is informed that a particular argument will be evaluated. In speculative, the callee informs the caller that an argument has already been evaluated. These two concepts are not the same, and while they overlap, there are instances where they differ considerably.Consider the following example:Here thefunction is strict in, but not either ofor. A strictness analyser would generate a variant ofwith only the first argument unboxed. In contrast the speculative variant will determine thatandshould be evaluated, and pass unboxed values in all arguments of, even thoughmay not evaluateorTo see this behaviour in GHC, it helps to makerecursive:Now run with the options, and you will see thevariant has typeThese thoughts are still very preliminary, and there are a number of unanswered questions: