No, but you can use better functions. I'm not talking about V.map (+64) , which would make things certainly a lot faster, but about nTimes . We have three candidates that already do what nTimes does:

{-# INLINE nTimesFoldr #-} nTimesFoldr :: Int -> (a -> a) -> a -> a nTimesFoldr n f x = foldr (.) id (replicate n f) $ x {-# INLINE nTimesIterate #-} nTimesIterate :: Int -> (a -> a) -> a -> a nTimesIterate n f x = iterate f x !! n {-# INLINE nTimesTail #-} nTimesTail :: Int -> (a -> a) -> a -> a nTimesTail n f = go n where {-# INLINE go #-} go n x | n <= 0 = x go n x = go (n - 1) (f x)

All versions take around 8 seconds, compared to the 40 seconds your versions take. Joachim's version also takes 8 seconds, by the way. Note that the iterate version takes more memory on my system. While there is an unroll plugin for GHC, it hasn't been updated in the last five years (it uses custom ANNotations).

No unroll at all?

However, before we despair, how well does GHC actually try to inline everything? Let's use nTimesTail and nTimes 1 :

module Main where import qualified Data.Vector.Unboxed as V {-# INLINE incAll #-} incAll :: V.Vector Int -> V.Vector Int incAll = V.map (+ 1) {-# INLINE nTimes #-} nTimes :: Int -> (a -> a) -> a -> a nTimes n f = go n where {-# INLINE go #-} go n x | n <= 0 = x go n x = go (n - 1) (f x) main :: IO () main = do let size = 100000000 :: Int let array = V.replicate size 0 :: V.Vector Int print $ V.sum (nTimes 1 incAll array)

$ stack ghc --package vector -- -O2 -ddump-simpl -dsuppress-all SO.hs

main2 = case (runSTRep main3) `cast` ... of _ { Vector ww1_s9vw ww2_s9vx ww3_s9vy -> case $wgo 1 ww1_s9vw ww2_s9vx ww3_s9vy of _ { (# ww5_s9w3, ww6_s9w4, ww7_s9w5 #) ->

We can stop right there. $wgo is the go defined above. Even with 1 GHC does not unroll the loop. It's disturbing since 1 is a constant.

Templates to the rescue

But alas, it's not all for naught. If C++ programmers are able to do the following for compile time constants, so should we, right?

template <int N> struct Call{ template <class F, class T> static T call(F f, T && t){ return f(Call<N-1>::call(f,std::forward<T>(t))); } }; template <> struct Call<0>{ template <class F, class T> static T call(F f, T && t){ return t; } };

And sure enough, we can, with TemplateHaskell*:

-- Times.sh {-# LANGUAGE TemplateHaskell #-} module Times where import Control.Monad (when) import Language.Haskell.TH nTimesTH :: Int -> Q Exp nTimesTH n = do f <- newName "f" x <- newName "x" when (n <= 0) (reportWarning "nTimesTH: argument non-positive") let go k | k <= 0 = VarE x go k = AppE (VarE f) (go (k - 1)) return $ LamE [VarP f,VarP x] (go n)

What does nTimesTH do? It creates a new function where the first name f is going to be applied to the second name x for a total of n times. n now needs to be a compile-time constant, which suits us, since loop-unrolling is only possible with compile-time constants:

$(nTimesTH 0) = \f x -> x $(nTimesTH 1) = \f x -> f x $(nTimesTH 2) = \f x -> f (f x) $(nTimesTH 3) = \f x -> f (f (f x)) ...

Does it work? And is it fast? How fast compared to nTimes ? Let's try another main for that:

-- SO.hs {-# LANGUAGE TemplateHaskell #-} module Main where import Times import qualified Data.Vector.Unboxed as V {-# INLINE incAll #-} incAll :: V.Vector Int -> V.Vector Int incAll = V.map (+ 1) {-# INLINE nTimes #-} nTimes :: Int -> (a -> a) -> a -> a nTimes n f = go n where {-# INLINE go #-} go n x | n <= 0 = x go n x = go (n - 1) (f x) main :: IO () main = do let size = 100000000 :: Int let array = V.replicate size 0 :: V.Vector Int let vTH = V.sum ($(nTimesTH 64) incAll array) let vNorm = V.sum (nTimes 64 incAll array) print $ vTH == vNorm

stack ghc --package vector -- -O2 SO.hs && SO.exe +RTS -t

True <<ghc: 52000056768 bytes, 66 GCs, 400034700/800026736 avg/max bytes residency (2 samples), 1527M in use, 0.000 INIT (0.000 elapsed), 8.875 MUT (9.119 elapsed), 0.000 GC (0.094 elapsed) :ghc>>

It yields the correct result. How fast is it? Let's use another main again:

main :: IO () main = do let size = 100000000 :: Int let array = V.replicate size 0 :: V.Vector Int print $ V.sum ($(nTimesTH 64) incAll array)

800,048,112 bytes allocated in the heap 4,352 bytes copied during GC 42,664 bytes maximum residency (1 sample(s)) 18,776 bytes maximum slop 764 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.000s 0.049s 0.0488s 0.0488s INIT time 0.000s ( 0.000s elapsed) MUT time 0.172s ( 0.221s elapsed) GC time 0.000s ( 0.049s elapsed) EXIT time 0.000s ( 0.049s elapsed) Total time 0.188s ( 0.319s elapsed) %GC time 0.0% (15.3% elapsed) Alloc rate 4,654,825,378 bytes per MUT second Productivity 100.0% of total user, 58.7% of total elapsed

Well, compare that to the 8s. So for a TL;DR: if you have compile-time constants, and you want to create and/or modify your code based on that constants, consider Template Haskell.