Some Current Approaches

In order to talk about the current approaches to this problem, we must understand what we want to do. The purpose of this document is to explore the solution space for efficient numerical computation in Haskell, and as such we need to explore three areas:

Central Processing Units (CPUs) Graphical Processing Units (GPUs) Computing Clusters

Each paradigm listed above have their uses and they can be used in tandem for greater effect, but the one which offers the highest throughput for operations such as the ones we’re after is the GPU, as hundreds or thousands of cores is hard to beat with eight or sixteen. That being said, I would say that an ideal situation is one in which we’ve stayed agnostic of this choice, which is why I looked into domain specific languages.

The first solution one comes across for numerical computing in Haskell is the vector library. Vector is, in it’s own words, an efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimization framework. The package is structured in terms of a Vector class and instantiations of the class at unboxed, externally stored, boxed, as well as primitive vectors. This library can be used for numerical computation and is a suitable solution for many simpler problems, especially given that the statistics library offers a lot of functionality on top of it that can be used for computing various facts about your data. Here I find the Pearson correlation coefficient between two populations of samples:

import Data.Vector.Generic

import Statistics.Correlation

import DataFetchingLib (getData) main = do

xs <- getData "Population1"

ys <- getData "Population2"

print (pearson (zip xs ys))

Here is another example using the math-functions package, which is depended on by statistics.

import GHC.Exts (fromList)

import Data.Vector.Generic (Vector(..))

import qualified Data.Vector.Unboxed as Unboxed

import Numeric.RootFinding

import Numeric.Polynomial

import Numeric.MathFunctions.Constants (m_epsilon) wasFound :: Root a -> Bool

wasFound (Root _) = True

wasFound _ = False main :: IO ()

main = do

-- f(x) = 1 + x^1 + x^2 + ... + x^100

let coeffs :: Unboxed.Vector Double = fromList [x | x <- [1..100]]

let poly x = evaluatePolynomial x coeffs

let roots = [ridders (m_epsilon * 4) (x, x + 0.2) poly

| x <- [-1000.0, -999.8 .. 1000.0]]

print (wasFound `filter` roots)

Another solution with some clear advantages is hmatrix, which simply exposes a bunch of BLAS/LAPACK backed operations on dense matrices. One great advantage of this library is the Numeric.LinearAlgebra.Static package, which offers statically sized vectors and matrices so you can rule out a large class of bugs while working on these kinds of programs. This library is a clear win if you have a bunch of dense linear algebra to do in your application and you can afford to do it on a CPU.

Another solution I reviewed, much more suitable to arbitrary data analysis, is repa, and was written as the result of [9]. The library, again in its own words, provides high performance, regular, multi-dimensional, shape polymorphic parallel arrays, ensuring that numeric data is stored unboxed. It features a parallelization framework which is essentially automatic and for that reason can produce incredibly efficient code. There is a GHC plugin, repa-plugin, which achieves data flow fusion, which is essentially what we liked from the Eigen library, the ability to not allocate so much memory but still use convenient notation. This is achieved in repa alone as well by utilizing a delayed representation alongside normal array representation, as this allows you to defer the construction of arrays to the consumer of whatever nice array you’ve designed in a delayed constructor. This library seems to have a specialization on image processing and has features which will support this very well as such.

The last solution I looked at was accelerate, a domain specific language (DSL) for high performance computing in Haskell which aims to make itself well suited for GPU execution [10]. It uses many of the same techniques as repa, but it is suited for execution in a great many contexts, and accelerate-llvm offers multicore as well as GPU backends. The clear advantage to accelerate is its lack of a context: it is a deep embedding where repa is a shallow one. This means that, whereas repa arrays are represented directly by their semantics, accelerate computations are kept in abstract syntax tree (AST) form, which means they can be run in a number of contexts. To me, this feels like a great advantage, as one could write some data analysis code and do some static analysis on it as well as run it in a very convenient way.