I'm in the middle of porting David Blei's original C implementation of Latent Dirichlet Allocation to Haskell, and I'm trying to decide whether to leave some of the low-level stuff in C. The following function is one example—it's an approximation of the second derivative of lgamma :

double trigamma(double x) { double p; int i; x=x+6; p=1/(x*x); p=(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x+0.5*p; for (i=0; i<6 ;i++) { x=x-1; p=1/(x*x)+p; } return(p); }

I've translated this into more or less idiomatic Haskell as follows:

trigamma :: Double -> Double trigamma x = snd $ last $ take 7 $ iterate next (x' - 1, p') where x' = x + 6 p = 1 / x' ^ 2 p' = p / 2 + c / x' c = foldr1 (\a b -> (a + b * p)) [1, 1/6, -1/30, 1/42, -1/30, 5/66] next (x, p) = (x - 1, 1 / x ^ 2 + p)

The problem is that when I run both through Criterion, my Haskell version is six or seven times slower (I'm compiling with -O2 on GHC 6.12.1). Some similar functions are even worse.

I know practically nothing about Haskell performance, and I'm not terribly interested in digging through Core or anything like that, since I can always just call the handful of math-intensive C functions through FFI.

But I'm curious about whether there's low-hanging fruit that I'm missing—some kind of extension or library or annotation that I could use to speed up this numeric stuff without making it too ugly.

UPDATE: Here are two better solutions, thanks to Don Stewart and Yitz. I've modified Yitz's answer slightly to use Data.Vector .

invSq x = 1 / (x * x) computeP x = (((((5/66*p-1/30)*p+1/42)*p-1/30)*p+1/6)*p+1)/x+0.5*p where p = invSq x trigamma_d :: Double -> Double trigamma_d x = go 0 (x + 5) $ computeP $ x + 6 where go :: Int -> Double -> Double -> Double go !i !x !p | i >= 6 = p | otherwise = go (i+1) (x-1) (1 / (x*x) + p) trigamma_y :: Double -> Double trigamma_y x = V.foldl' (+) (computeP $ x + 6) $ V.map invSq $ V.enumFromN x 6

The performance of the two seems to be almost exactly the same, with one or the other winning by a percentage point or two depending on the compiler flags.

As camccann said over at Reddit, the moral of the story is "For best results, use Don Stewart as your GHC backend code generator." Barring that solution, the safest bet seems to be just to translate the C control structures directly into Haskell, although loop fusion can give similar performance in a more idiomatic style.