Haskell Dice of Doom - Part 3

We continue our port of the Land of Lisp game “Dice of Doom” from Part 1 and Part 2.

Source code can be found on Github.

We can now play 2x2 games with no problem. However, trying to play a 3x3 game poses a significant challenge for us - it can be very slow. In this part, we’ll take a look at improving the performance of the game for boards bigger than 2x2.

Performance Characteristics

Before doing anything we should look at the performance characteristics of the game, in particular the size of the problem. For an N x N board with 2 players and a max of 3 dice per cell, the number of board configurations is (23)^(NN). A 2x2 game has 1,296 possible boards, while a 3x3 one has 10,077,696. A 4x4 game has 2,821,109,907,456 possible configurations - that’s over two trillion!

For the 2x2 boards, we can generate game trees for all scenarios and look at some stats, depending on whether A starts or B starts:

The number of nodes in the tree.

The eventual winners in each tree.

treeSize :: Tree a -> Int treeSize = length . flatten leaves :: Tree a -> [ a ] leaves tree = getLeaves tree [] where getLeaves :: Tree a -> [ a ] -> [ a ] getLeaves ( Node root children ) total = total ++ ( if null children then [ root ] else concat [ getLeaves c total | c <- children ]) allWinners :: Tree GameState -> [([ Player ], Int )] allWinners tree = map ( \ xs @ ( x : _ ) -> ( x , length xs )) . group . sort $ map ( winners . currentBoard ) $ leaves tree

Getting the tree size by flattening the entire tree into a list and then getting the length of the list probably isn’t the most efficient way of doing it, but we’ll get to that in a while.

We can generate all possible 2x2 boards:

all2x2Boards :: [ Board ] all2x2Boards = [ Board { size = 2 , maxDice = 3 , numPlayers = 2 , cells = [ Cell { player = cell0Player , dice = cell0Dice } , Cell { player = cell1Player , dice = cell1Dice } , Cell { player = cell2Player , dice = cell2Dice } , Cell { player = cell3Player , dice = cell3Dice } ], conqueredDice = 0 } | cell0Player <- allPlayers , cell1Player <- allPlayers , cell2Player <- allPlayers , cell3Player <- allPlayers , cell0Dice <- allDice , cell1Dice <- allDice , cell2Dice <- allDice , cell3Dice <- allDice ] where allPlayers = [ Player 0 , Player 1 ] allDice = [ 1 , 2 , 3 ]

The function to print out the board statistics for these boards is:

boardStats :: Board -> IO () boardStats board = do drawBoard board putStrLn $ printf "A: %d \t B: %d \t A Winners: %s \t B Winners: %s" numPlayerANodes numPlayerBNodes ( show winnersA ) ( show winnersB ) putStrLn "" where treeA = gameTree board ( Player 0 ) Pass True treeB = gameTree board ( Player 1 ) Pass True numPlayerANodes = treeSize treeA numPlayerBNodes = treeSize treeB winnersA = allWinners treeA winnersB = allWinners treeB playAll2x2Boards :: IO () playAll2x2Boards = mapM_ boardStats all2x2Boards

The results are given here

Most of these games are short or very one-sided. Interesting games are balanced ones, with a large number of nodes usually leading to a more even possibility of A winning, B winning or a tie.

This doesn’t look too bad. How about a 3x3 game? We won’t generate all possible boards, but take a few random ones and play them.

Note: GHC vs GHCi

Whenever you’re doing performance work I recommend you compile your code with optimisations switched on, rather than running it through GHCi. To do this we’ll need to add a main function and change the module name to “Main”.

import Data.Time ( getCurrentTime ) playRandom3x3Board :: IO () playRandom3x3Board = genBoard 3 2 3 >>= boardStats main :: IO () main = do t1 <- getCurrentTime print t1 playRandom3x3Board t2 <- getCurrentTime print t2

Then compile with optimisations using the -O2 flag:

ghc -O2 DiceOfDoom-h.hs

Playing this a few times:

2014-10-05 08:55:58.526566 UTC A-1 B-2 A-1 B-1 B-2 A-1 A-1 A-1 A-3 A: 75 B: 1307 A Winners: [([A],31)] B Winners: [([A],388)] 2014-10-05 08:55:58.539907 UTC 2014-10-05 08:57:12.153751 UTC A-3 A-1 B-2 B-2 A-3 A-2 B-1 B-2 B-1 A: 845109 B: 137339 A Winners: [([A],340484),([B],10202)] B Winners: [([A],62628),([B],260)] 2014-10-05 08:57:19.853758 UTC 2014-10-05 08:58:18.31154 UTC A-1 A-3 A-1 B-2 B-2 B-3 A-2 B-1 B-2 A: 3890171 B: 5065637 A Winners: [([A],186513),([B],1065740)] B Winners: [([A],440445),([B],1322381)] 2014-10-05 08:59:26.736873 UTC

Wow! There’s a huge variation here. The last game tree, when play is started by B, has over 5 million nodes.

That’s nothing: the following game tree has 1.4 BILLION nodes when play is started by A:

2014-10-05 11:40:35.780121 UTC A-2 A-2 B-2 A-3 B-1 A-3 A-3 A-3 B-3 A: 1468919491 B: 1 2014-10-05 13:24:24.230547 UTC

You might ask yourself how this was calculated, given that my laptop only has 8GB of RAM. Let’s say that each node is 100 bytes. This is assuming 1 Int == 4 bytes and summing up the Ints used in players, cells etc. The actual value is much larger, but let’s be conservative. We can fit 10 boards in 1KB, 10,000 boards in 1MB, 10 million boards in 1GB, which means that the max tree size I should be able to accomodate in memory is one with 80 million nodes. So how could I calculate the size of the above game tree?

The answer is laziness and garbage collection. I initially only wanted to know the tree sizes, so my main function was like this:

main :: IO () main = do t1 <- getCurrentTime print t1 putStrLn $ printf "A: %d" ( treeSize $ gameTree test3x3BoardE ( Player 0 ) Pass True ) putStrLn $ printf "B: %d" ( treeSize $ gameTree test3x3BoardE ( Player 1 ) Pass True ) t2 <- getCurrentTime print t2

It took a couple of hours, but gave the following results:

2014-10-05 11:40:35.780121 UTC A-2 A-2 B-2 A-3 B-1 A-3 A-3 A-3 B-3 A: 1468919491 B: 1 2014-10-05 13:24:24.230547 UTC

The impressive thing was that it ran in constant memory. Looking at the process stats in Mac OS X’s Activity Monitor, the private memory usage never got above 2.5 MB.

Buoyed on by this, I added more stats, including the winners. Re-running, it didn’t take long before I consumed all available memory.

The reason for this is that if I just get the tree size, unused nodes are garbage collected. I don’t need to generate the whole tree up-front to start calculating the tree size - laziness helps here. However, if I need the tree for separate calls to get the size and the winners, the nodes can’t be garbage collected before the second call, and the whole tree resides in memory.

Aside: Calculating the Size of the Tree

The initial version of treeSize flattens the entire tree and then gets its length. I thought this would be horribly ineffieicent, but that it would do me for a while. After getting an initial set of stats, I resolved to ‘fix’ the issue with treeSize :

import qualified Data.Foldable as F treeSize :: Tree a -> Int treeSize = F . foldl ( \ x _ -> x + 1 ) 0

Re-running this on a large tree (several million nodes) resulted in all memory being consumed.

The problem here is with the foldl function. It’s much better to use foldl' .

import qualified Data.Foldable as F treeSize :: Tree a -> Int treeSize = F . foldl' ( \ x _ -> x + 1 ) 0

See the WikiBooks entry for more information on this.

Even with this, the function wasn’t as fast as the flattening one. I finally settled on the following simple function, which turns out to be faster still:

treeSize :: Tree a -> Integer treeSize ( Node _ children ) = 1 + sum ( map treeSize children )

Shared Structure

There’s an interesting thing about the huge game tree above: if there are 1.4 billion nodes in a tree, but only 10 million possible board configurations, then there’s proabably a lot of repetition going on - the same branches may occur multiple times within the tree.

Let’s see an example of this. The game tree used in Part 2 is a good example:

Your browser does not support SVG

The shared nodes are coloured in various shades of yellow.

The Lisp implementation memoises the calls to get-tree by using a hash-based lookup table. By doing this it only calculates a sub-tree once. Let’s see what effect this memoisation has on our test trees.

By modifying the memoised function, we can log the ‘cache-hits’ vs the ‘cache-misses’:

; memoization (let ((old-game-tree (symbol-function 'game-tree)) (previous (make-hash-table :test #'equalp))) (defun game-tree (&rest rest) (or (progn (fresh-line) (princ "HIT") (gethash rest previous)) (progn (fresh-line) (princ "MISS") (setf (gethash rest previous) (apply old-game-tree rest))))))

Run the game-tree function on *test-board-3x3-e* :

sh$ clisp -q << ( load "dice_of_doom_v1.lisp" ) ( tree-size ( game-tree * test -board-3x3-e * 0 0 t )) ( quit ) -EOF | sort | uniq -c

As with the Haskell version, this runs in fairly-constant space, but still takes a couple of hours.

For the board with 1.4 billion nodes, the results are interesting

472109 HIT 201861 MISS

In other words, in the tree of 1.4 billion nodes, with 472,109 sub-trees, only 201,861 of these sub-trees had to be computed. Even if you play this game in the CLisp REPL, it only takes around 30 seconds for it to generate the game tree.

The Lisp memoization solves our two problems above:

The time taken to generate the game tree is reduced. Also, by memoising the other calls, the time taken to calculate moves is also severly reduced.

The space taken up by the tree is also a lot less.

This second point is interesting, and not immediately obvious on a casual glance at the Lisp code. Once a sub-tree has been stored in the hash table, all subsequent ‘cache hits’ returned by the game-tree function are referenced, not copied, in the parent. While logically the structure is a tree, several sub-trees share the same structure. In effect, this turns the tree into a Directed Acyclic Graph. The ‘magic’ behind this is the Lisp Cons Cell. If you want to find more information on this, Peter Siebel’s book Practical Common Lisp, in particular Chapter 12, They Called It Lisp For A Reason is a good read.

Haskell also has cons cells where we can have several items in a list referring to a single shared data structure. Using this, we can memoise the game tree creation in a similar way.

Aside: Profiling an Application

Before we second-guess the performance issues in the game, it’s good to get a real idea of where the bottlenecks lie by profiling it.

GHC allows us to turn on profiling:

ghc -O2 -prof -auto-all DiceOfDoom-h.hs

To run, we need to tell the GHC runtime that we want do do the profiling (-p) and that we want garbage collection stats as well (-s): (This test was done using test3x3BoardB)

./DiceOfDoom-h +RTS -p -s

The profiling output is stored in DiceOfDoom-h.prof.

test3x3BoardB

test3x3BoardC

The GC stats are very telling. For test3x3BoardC we have:

112,814,519,176 bytes allocated in the heap 17,082,086,768 bytes copied during GC 2,771,996,128 bytes maximum residency (17 sample(s)) 50,573,008 bytes maximum slop 7803 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 218297 colls, 0 par 14.70s 20.17s 0.0001s 0.1349s Gen 1 17 colls, 0 par 14.46s 44.15s 2.5968s 21.5761s INIT time 0.00s ( 0.00s elapsed) MUT time 116.08s (135.91s elapsed) GC time 29.16s ( 64.31s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.10s ( 1.50s elapsed) Total time 145.35s (201.72s elapsed) %GC time 20.1% (31.9% elapsed) Alloc rate 971,852,043 bytes per MUT second Productivity 79.9% of total user, 57.6% of total elapsed

There’s a lot of GC going on - which shouldn’t be a surprise as the game trees have millions of nodes.

For info on the interpretation of GC stats, see Running a compiled program in the GHC docs.

Memoising Tree Creation

In languages that allow mutation, memoising is a fairly straightforward process: take a function you want memoised, wrap it in another function that only calls it if it hasn’t already been called with those arguments, and that stores the result for subsequent calls.

Haskell doesn’t allow mutation directly, so we have to use some sort of state mechanism. The approach we’ll use is described in http://www.maztravel.com/haskell/memofib.html and in a thread on the comp.lang.haskell group.

In some ways the approach is similar to the Lisp one above: we have a function that stores calculated values in a map. However, we can’t just wrap the old function in another one - we have to modify the old function to enable us to memoise it.

NOTE It’s important to note that the type of memoisation presented here is slightly different to what you’d expect if you were memoising functions in Python or Ruby. In the Haskell memoisation code above, the memoisation only works within recursive calls to the same function. It doesn’t work across non-recursive calls. So if you’re in GHCI and you do the following:

ghci> treeSize $ gameTree test3x3BoardB (Player 0) True 26359321 ghci> treeSize $ gameTree test3x3BoardB (Player 0) True 26359321

Then the second call to gameTree isn’t “cached” in any way - it goes through exactly the same process of trying to memoise recursive calls to itself starting off with an empty map. We’ll see an example of memoising that can do this when we memoise the calls to get the ratings.

Fixing a Design Flaw

Before we apply memomisation to the game tree creation, we need to fix a flaw in the way the tree was designed. Recall that the game state is defined by:

data GameState = GameState { currentPlayer :: Player , moveMade :: Move , currentBoard :: Board }

We stored the moveMade in the game state to tell us that a particular move was made to get to this state. It made it easy to enumerate the child nodes and see what move would result in that board.

However, several different moves from different boards can result in the same end state. If we want to memoise the game tree from a possible board, we need to remove the moveMade reference and put this into the parent. The game state should only have the current player, the current board, and the list of possible moves from that board.

To do this we’re going to put the moves made from a parent node to a child node into the parent itself, as an array of moves.

data GameState = GameState { currentPlayer :: Player , currentMoves :: [ Move ], currentBoard :: Board }

This changes the gameTree function slightly, in particular we don’t need the fromMove parameter any more. Two other functions that change are allowedMoves that maps the possible moves to numbers, and showGameGraphTree used to generate the Graphviz output:

gameTree :: Board -> Player -> Bool -> Tree GameState gameTree board p isFirstMove -- No further moves possible. Switch players and add the reinforcements | null possibleMoves && not isFirstMove = Node GameState { currentPlayer = p , currentMoves = [ Pass ], currentBoard = board } passingGameTree -- No moves possible - END OF GAME | null possibleMoves && isFirstMove = Node GameState { currentPlayer = p , currentMoves = [ Pass ], currentBoard = board } [] -- Keeping with the same player, recurse through all moves | otherwise = Node GameState { currentPlayer = p , currentMoves = passingMove ++ possibleMoves , currentBoard = board } ( passingGameTree ++ [ gameTree ( makeAMove board p m ) p False | m <- possibleMoves ] ) where possibleMoves = attackMoves board p passingMove = if isFirstMove then [] else [ Pass ] passingGameTree = if isFirstMove then [] else [ gameTree ( reinforce board p ) -- Add reinforcements ( nextPlayer board p ) -- Switch player True -- First move for new player ] allowedMoves :: Tree GameState -> [( Int , Move )] allowedMoves ( Node root _ ) = zip [ 1 .. ] $ currentMoves root showGameGraphTree :: Tree ( GameState , Int ) -> String showGameGraphTree ( Node ( root , number ) children ) = concat $ [ printf " \" %d \" -> \" %d \" [label= \" %s \" ;];

" number child_number ( show moveMade ) :: String | ( moveMade , ( Node ( child , child_number ) _ )) <- zip ( currentMoves root ) children ] ++ [ showGameGraphTree c | c <- children ]

The code is in DiceOfDoom-i.hs.

Setting up the Memoising Map

In order to memoize the calls to gameTree using the method above, we’ll need to be able to map a single something to a game tree. The gametree function takes three parameters: a board, a player and a boolean indicating if it’s the first move or not. We’ll wrap these three items in a single tuple. The structure of the code remains the same as the non-memoizing version:

type GameTree = Tree GameState gameTree :: Board -> Player -> Bool -> GameTree gameTree board p isFirstMove = memoizeM gameTreeM ( board , p , isFirstMove ) gameTreeM :: Monad m => (( Board , Player , Bool ) -> m GameTree ) -> ( Board , Player , Bool ) -> m GameTree gameTreeM f' ( board , p , isFirstMove ) -- No further moves possible. Switch players and add the reinforcements | null possibleMoves && not isFirstMove = do passTree <- f' ( reinforce board p , -- Add reinforcements nextPlayer board p , -- Switch player True ) -- First move for new player return $ Node GameState { currentPlayer = p , currentMoves = [ Pass ], currentBoard = board } [ passTree ] -- No moves possible - END OF GAME | null possibleMoves && isFirstMove = return $ Node GameState { currentPlayer = p , currentMoves = [ Pass ], currentBoard = board } [] -- Keeping with the same player, recurse through all moves | otherwise = do childTrees <- mapM ( \ b -> f' ( b , p , False )) [ makeAMove board p m | m <- possibleMoves ] if isFirstMove then return $ Node GameState { currentPlayer = p , currentMoves = possibleMoves , currentBoard = board } childTrees else do passTree <- f' ( reinforce board p , -- Add reinforcements nextPlayer board p , -- Switch player True ) -- First move for new player return $ Node GameState { currentPlayer = p , currentMoves = Pass : possibleMoves , currentBoard = board } ( passTree : childTrees ) where possibleMoves = attackMoves board p type StateMap a b = State ( Map . Map a b ) b memoizeM :: ( Show a , Show b , Ord a ) => (( a -> StateMap a b ) -> ( a -> StateMap a b )) -> ( a -> b ) memoizeM t x = evalState ( f x ) Map . empty where -- Cache miss g z = do y <- t f z m <- get put $ Map . insert z y m return y -- Cache hit f z = get >>= \ m -> maybe ( g z ) return ( Map . lookup z m )

Code in DiceOfDoom-j.hs

Let’s test this out on the 1.4 billion-node game tree:

main :: IO () main = do t1 <- getCurrentTime print t1 putStrLn $ printf "Size: %d" ( treeSize tree ) putStrLn $ printf "Depth: %d" ( treeDepth tree ) t2 <- getCurrentTime print t2 where tree = gameTree test3x3BoardE ( Player 0 ) True

Running this:

ghc -O2 DiceOfDoom-j.hs ./DiceOfDoom-j 2014-11-01 09:21:49.2206886 UTC Size: 1468919491 Depth: 40 2014-11-01 09:23:28.837608 UTC

One minute and 40 seconds - that’s not bad. Memory usage peaks at 145MB.

If we profile this, and look at the GC stats for test3x3BoardC

ghc -O2 -prof -auto-all DiceOfDoom-j.hs ./DiceOfDoom-j +RTS -p -s 34,575,453,936 bytes allocated in the heap 4,376,663,592 bytes copied during GC 146,595,568 bytes maximum residency (35 sample(s)) 3,292,400 bytes maximum slop 425 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 66920 colls, 0 par 2.86s 3.07s 0.0000s 0.0376s Gen 1 35 colls, 0 par 3.01s 3.75s 0.1073s 0.2740s INIT time 0.00s ( 0.00s elapsed) MUT time 21.64s ( 22.14s elapsed) GC time 5.88s ( 6.82s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.07s elapsed) Total time 27.52s ( 29.04s elapsed) %GC time 21.4% (23.5% elapsed) Alloc rate 1,597,733,984 bytes per MUT second Productivity 78.6% of total user, 74.5% of total elapsed

There’s a lot less GC going on, and the total memory in use has gone down from 7.8 GB to 425 MB.

Memoising the ratings functions

We can now play the large game above against another human:

main :: IO () main = do playVsHuman tree where tree = gameTree test3x3BoardE ( Player 0 ) True

$ ghc -O2 DiceOfDoom-j.hs Current player: A A-2 A-2 B-2 A-3 B-1 A-3 A-3 A-3 B-3 choose your move: 1: Attack 0 4 2: Attack 1 4 3: Attack 3 4 4: Attack 5 2 5: Attack 5 4 6: Attack 7 4 ...

However, if we try to play against the computer, we’re still in trouble:

main :: IO () main = do playVsComputer ( Player 0 ) tree where tree = gameTree test3x3BoardE ( Player 0 ) True

$ ghc -O2 DiceOfDoom-k.hs Current player: A A-2 A-2 B-2 A-3 B-1 A-3 A-3 A-3 B-3 ...

It hangs for a long time. If we take a look at the profiling info, we can see that the various ratings functions are taking up a lot of time. We need to memoise these as well.

We can’t use the same memoising technique that we used to generate the tree. The childRatings function does make recursive calls to itself indirectly via ratePosition , but it is called several times independently by handleComputer . We’d really like to memoise separate calls.

We’ll take the approach used in Ugly Memoization by Lennart Augustsson. It uses unsafe IO and MVars.

childRatingsM :: Tree GameState -> GameState -> Player -> [ Double ] childRatingsM ( Node _ children ) _ nodePlayer = [ ratePosition c nodePlayer | c <- children ] childRatings :: Tree GameState -> Player -> [ Double ] childRatings tree @ ( Node root _ ) p = memo childRatingsM tree root p memo :: ( Show b , Ord b , Ord c ) => ( a -> b -> c -> d ) -> ( a -> b -> c -> d ) memo f = let f' = unsafePerformIO ( memoIO f ) in \ x y z -> unsafePerformIO ( f' x y z ) memoIO :: ( Ord b , Ord c ) => ( a -> b -> c -> d ) -> IO ( a -> b -> c -> IO d ) memoIO f = do v <- newMVar Map . empty let f' x y z = do m <- readMVar v case Map . lookup ( y , z ) m of Nothing -> do let r = f x y z n <- takeMVar v putMVar v ( Map . insert ( y , z ) r n ) return r Just r -> return r return f'

The childRatings function takes a game tree and a player. However, we only need to memoise the root of the tree and the player, so the childRatingsM function takes the root as a parameter but does nothing with it.

Code in DiceOfDoom-k.hs.

With this in place, we can now play against the computer:

ghc -O2 DiceOfDoom-k.hs ./DiceOfDoom-k Current player: A A-2 A-2 B-2 A-3 B-1 A-3 A-3 A-3 B-3 choose your move: 1: Attack 0 4 2: Attack 1 4 3: Attack 3 4 4: Attack 5 2 5: Attack 5 4 6: Attack 7 4 1 ...

That’s it for part 3. The game is at the same stage as the end of chapter 15 in Land of Lisp. In the next part we’ll get the game to handle even larger boards.