From HaskellWiki

This page is a little out of date, and since it was written:

GHC's IO manager has been rewritten to use epoll/kqueue/poll, which should mean all the forkIO examples will run faster than they did in this benchmark.

network-bytestring has been merged into the network package, so you don't need to get the two libraries separately.

Some example of simple web server designs in Haskell, using preemptive concurrency, or event-driven approaches. Requirements:

Some more context on the background to this problem is available.

Benchmarks with httperf,

$ httperf --server=localhost --port=5002 --uri=/ --num-conns=10000

Author: dons

Results

Basic concurrent server

Concurrent, with String IO. Here on each accept from the main thread, we create a new Handle, and forkIO a lightweight Haskell thread to write a string back to the client. Relies on the runtime scheduler to wake up the main thread in a timely fashion (i.e. via the current 'select' mechanism).

import Network import Control.Concurrent import System.IO main = withSocketsDo $ do sock <- listenOn $ PortNumber 5002 loop sock loop sock = do ( h , _ , _ ) <- accept sock forkIO $ body h loop sock where body h = do hPutStr h msg hFlush h hClose h msg = "HTTP/1.0 200 OK \r

Content-Length: 5 \r

\r

Pong! \r

"

Measurements:

$ ghc -O2 --make A.hs

Request rate: 6569.1 req/s (0.2 ms/req)

Concurrent, with network-bytestring

Now, using bytestring IO (via the network-bytestring package) (but still using the rts' select-based preemptive threads). Just means we allocate nothing in the body, and avoid a couple of copies to do the IO.

{-# LANGUAGE OverloadedStrings #-} import Data.ByteString.Char8 import Network hiding ( accept ) import Network.Socket import Network.Socket.ByteString ( sendAll ) import Control.Concurrent main = withSocketsDo $ do sock <- listenOn $ PortNumber 5002 loop sock loop sock = do ( conn , _ ) <- accept sock forkIO $ body conn loop sock where body c = do sendAll c msg sClose c msg = "HTTP/1.0 200 OK \r

Content-Length: 5 \r

\r

Pong! \r

"

Measurements:

$ ghc -O2 --make H.hs

Request rate: 9901.7 req/s (0.1 ms/req)

Epoll-based event callbacks

Now, instead of using the RTS' select mechanism to wake up threads, we use a custom epoll handler. Using epoll-based event handling, and bytestring IO. The epoll approach will replace GHC's select model soon (design here showing how the concurrent Haskell primitives may be implemented in terms of epoll).

{-# LANGUAGE OverloadedStrings #-} -- A simple example of an epoll based http server in Haskell. -- -- Uses two libraries: -- * network-bytestring, bytestring-based socket IO. -- - cabal install network-bytestring: -- -- * haskell-event, epoll-based scalable IO events -- - git clone git://github.com/tibbe/event.git -- - autoreconf ; then cabal install import Network hiding ( accept ) import Network.Socket ( fdSocket , accept ) import Network.Socket.ByteString import Data.ByteString.Char8 import System.Event import System.Posix import System.Posix.IO main = withSocketsDo $ do sock <- listenOn $ PortNumber 5002 let fd = fromIntegral ( fdSocket sock ) mgr <- new registerFd mgr ( client sock ) fd evtRead loop mgr client sock _ _ = do ( c , _ ) <- accept sock sendAll c msg sClose c msg = "HTTP/1.0 200 OK \r

Content-Length: 5 \r

\r

Pong! \r

"

Measurements:

ghc -O2 --make Epoll.hs

Request rate: 15042.6 req/s (0.1 ms/req)

So significantly better. By the way, under the same conditions, this Python epoll version achieves 10k req/sec.

Further work: there are still traditional calls to accept and sendAll, going via the Haskell concurrent IO layer, which are have redundant threading calls, so a fair bit of additional performance may be untapped.

Notes

Simon Marlow states: The Haskell program as it stands won’t scale up on a multicore because it only has a single accept loop, and the subtasks are too small. The cost of migrating a thread for load-balancing is too high compared to the cost of completing the request, so it’s impossible to get a speedup this way. If you create one accept loop per CPU then in principle it ought to scale, but in practice it won’t at the moment because there is only one IO manager thread calling select(). Hopefully this will be fixed as part of the ongoing epoll() work that was mentioned earlier.

Regarding the slowdown you see with -threaded, this is most likely because you’re running the accept loop in the main thread. The main thread is special – it is a “bound thread”, which means it is effectively a fully-fledged OS thread rather than a lightweight thread, and hence communication with the main thread is very expensive. Fork a subthread for the accept loop, and you should see a speedup with -threaded.

More background on a similar benchmark in this ticket: http://hackage.haskell.org/trac/ghc/ticket/3758