From: "andrew cooke" <andrew@...>

Date: Mon, 4 Jun 2007 06:35:15 -0400 (CLT)

I have implemented various versions of a "Parallel Sudoku Solver" in Erlang. These notes try to capture some of the lessons I have learned. They are written with an eye on Jonathan Sillito's recent work adding similar (message sending) functionality to Ruby - I think there are some interesting issues about "transplanting" ideas from one language to another (one that includes mutable state but excludes efficient tail calls) and, at a more practical level, implementing state machines in an OO language (something I have worried about, intermittently, for years). Implementation Details A quick reading of the latest code at http://www.acooke.org/cute/ReducedRan0.html shows that it is an almost-declarative description of a state machine. Each state (starting, searching, asserting, swapping, sleeping) is embodied as a function. Messages are received in a particular state. The logic in that state (compactly described with pattern matching) selects a transition. Transitions are helper functions that may send a response message before calling a new state. Tail calls are used throughout. Since Erlang "optimises" such calls no stack is consumed in the transition and the implementation does not have to be concerned with "returning" (an alien concept for state machines). Some messages are common to more than one transition. These are handled by a single function which takes the following state as an argument (first class functions simulating continuation passing). Traditional Functional Strengths The code illustrates how Erlang plays to functional programming's traditional strengths: - Pattern matching (extended in Erlang with conditions using a restricted range of predicates) allows program logic to be expressed in a clear, concise manner. - Explicit state and tail call optimisation simplify transitions between states. - First class functions allow easy generalisation of transitions. Together these make it easy to directly embed a state machine in the source code. Erlang also provides, in the OTP platform, "behaviour" support for state machines. I have not yet investigated this. No Magic Bullet Erlang's message passing approach does not "magically" fix the problems inherent in concurrent software. During development I stumbled across deadlocks and inconsistent state more than once. What it does, however, is allow the code to be structured so that the assumptions, state and logic are easy to read. This helps ad-hoc reasoning while debugging and allows issues to be resolved relatively quickly (this is a relatively subjective view and I have no more objective evidence). Lessons Learned The following approaches (patterns?) were helpful: - Match for unexpected messages (and fail with a message when you receive them). This is analogous to the default case in case statements. - Use a fixed format for mutable state (ie the arguments to the functions that represent states in the FSM). This allows general transitions which take the "next" state/function as an argument. - Use tagged tuples for messages (ie first tuple value is an atom that "names" the message). This is a recognised custom in Erlang and gives ad-hoc typing. - Use {name, {payload...}} pairs for messages (ie the payload is a separate tuple). This allows matching on the name (eg {type, _Payload}) without hard-coding the payload structure in each pattern. - Fail deeply. When failure occurs, fall back to as basic a state as possible. Avoid states that loop on failure, immediately repeating the action, since this encourages deadlock (an example directly from the code: in the swapping state, failure to swap was initially handled by a second attempt at swapping. This eventually lead to deadlock with all cells requesting swaps from other cells.) - Avoid invariants. This is a hard one, because invariants are so useful, but maintaining an invariant across several threads requires detailed negotiation that cannot always "fail deeply" (in the "community invariant" described below a cell that attempts to swap its value with another cell cannot process further requests that modify its state - assertions or swaps - until it has a reply from the swap partner). - Use explicit, restricted protocols. "Fail deeply" and "avoid invariants" drive the system towards a "permissive" protocol. This should be resisted as much as possible, since it becomes difficult to detect errors (ie "Match for unexpected messages" becomes less useful). Ruby Library My main concern for the Ruby library is that a lack of efficient tail calls means that complex state machines cannot be easily separated into distinct functions (one per state). There seem to be two solutions: either use trampolining or represent state as a variable rather than a function. It would be interesting to explore these two avenues - it may be that they suggest an alternative syntax better suited to Ruby. States as Objects The natural approach to state machines in an OO language without support for unlimited tail calls seems to be: - Each state is an object - Each state implements a standard interface ("run" or similar) that returns the *next* state - Have a outer object (the state "machine") that manages the process of calling the standard interface on the successive states. - As I noted 8 years ago - http://www.acooke.org/andrew/smachine.html - machines can themselves be states. Reflecting a little on the comments above it becomes clear that this combines both approaches described in the previous section - the state is a variable whose value is the current state and the outer machine object is the "trampoline". So I guess I have answered my own question - it should be possible to adapt the code to use an approach like this. The remaining issues (eg does each state create the "next" state on every call, or is there a pool?) seem to be mainly book-keeping. Translating to Ruby If the above is correct then most comments here also apply to code in Ruby. The main changes are: - Tail calls are replaced by the trampolining approach described above. - "Generic" transitions would be methods in a shared superclass rather than functions that take a "next state" function. Algorithm Discussion The algorithm used is, obviously, a poor choice for solving Sudoku puzzles. Peter Norvig has already given a textbook-perfect solution at http://norvig.com/sudoku.html However, the parallel approach it is still useful as an example for learning Erlang/message passing: - It addresses a fashionable issue, which is motivating (received attention from Reddit, for example). - The initial approach is manageable for a newbie. - Progressively more complex solutions to be introduced (eg maintaining the "community invariant" so that one of each digit is used in each "block of 9"). - Finally, it is also possible that in some situations (although, in my experience, less often than one believes when faced with an apparently difficult problem) "guided random" exploration of the solution space is the only tool left. Motivated by the last point, and because I find the issues interesting, I will use the remaining sections of this note to comment on the various strategies used (the history is somewhat simplified to present the main points - in practice there were many more dead-ends and mistakes). Algorithm: Random Exploration The original implementation modelled each cell as an independent entity aware only of those cells with which it must not (in the final solution) conflict. At random it would send a message to one of these "competitors" asserting a value. If the two cells conflicted, one of the cells had to change to a new (random) value. While this is pretty much brain-dead as a practical optimisation scheme it has one critical property that we rely on and must preserve: its fixed point is the solution required. In other words, if it ever reaches a consistent solution it will not change values further (because there are no conflicting cells). This allows us to detect a successful solution by monitoring the (lack of) activity in the system. Algorithm: Community Invariant The solution space explored initially is huge (9^81 = 2e77 candidate solutions). This can be drastically reduced by guaranteeing that each digit from 1 to 9 is used just once in each "block" on the sudoku grid. This reduces the number of candidate solutions to 9!^9 = 1e50 (an improvement factor of 2e27). Implementing this invariant took some care - see the "Lessons Learned" for more details. Algorithm: Reduced Range Cells maintain a list of "legal values" which is modified when they interact with a "given" value. In this way Cells "learn" to avoid values that are guaranteed to conflict. This is "first order" constraint propagation. I have not measured the effect of this change but we can make a rough estimate: if each digit can conflict with two given values, on average, then any particular arrangement of a block has a probability of (7/9)^9 = 0.1 of being correct (wildly assuming independence). That reduces the number of candidate solutions by a further nine (number of blocks) orders of magnitude. Algorithm: Weighting Each cell maintains a "weight" that reflects the "confidence" in the current value. New values have zero weight, and weight increases when an assertion from a neighbouring does *not* conflict. Weighting can be used to guide behaviour when two cells interact, comparing and exchanging values. When comparing values the cell with lower weight is the one that changes; swapping is only successful with a lower weighted value in the same block. In this way weighting increases the persistence of more likely values and so "guides" the system to a self-consistent solution. Algorithm: Greediness The simple weighting described above is too strong. It encourages the growth of self-consistent "structures" of cells, but does not allow these structures to be discarded when they block the correct solution. The "greediness factor" introduces some noise into the weighting decisions. A greediness of 0.9 means, roughly, that weighting is followed only 90% of the time. In this way even strongly weighted cells can be forced to change value. Watching the program run (if configured to print a snapshot every second or so) it is easy to see how varying the greediness parameter affects the solution search. A value of 0.9 gives a system that oscillates between fairly long periods of detailed refinement, when only a few "uncertain" values are changed, and shorter intermittent sessions of more violent change, when nearly all values value. With a value of 0.5 the detailed searches are shorter; "wide-scale" changes are more common. While useful, an adjustable parameter requires tuning, which is difficult when the test case takes several hours to run. Also, there is no guarantee that a single "good" value of greediness exists for all puzzles. Andrew