

^\(0\|1\(1\(01*00\)*01*010\|\(0\|1\(01*00

\)*01*011\)\(10\(01*00\)*01*011\|\(0\|11\

)1\)*\(10\(01*00\)*01*010\|\(0\|11\)0\)\)

*\(1\(01*00\)*1\|\(0\|1\(01*00\)*01*011\)

\(10\(01*00\)*01*011\|\(0\|11\)1\)*10\(01

*00\)*1\)\)*$





A = A0+B1+1



B = C0+A1



C = B0+C1





x = xM+v





M=

0 1 0

1 0 0

0 0 1





x = [A B C], v = [1 0 0]





C = B0+C1





C(1-1) = B0



C = B0(1-1)-1





C = B01*







> import List

> data RE a = Symbol a | Star (RE a) | Sum [RE a] | Prod [RE a]

> deriving (Eq,Show,Ord)



> a <*> b = Prod [a,b]

> a <+> b = Sum [a,b]



> zero = Sum []

> one = Prod []









> simplify a = let b = simplify' a in if a==b then b else simplify b where

> simplify' (Prod [a]) = simplify a

> simplify' (Prod a) | zero `elem` a = zero

> simplify' (Prod (a:b)) = case simplify a of

> Prod x -> Prod $ x ++ map simplify b

> a -> Prod $ filter (/= one) $ map simplify (a:b)



> simplify' (Sum [a]) = a

> simplify' (Sum (a:b)) = case simplify a of

> Sum x -> Sum $ x ++ map simplify b

> a -> Sum $ nub $ sort $ filter (/= zero) $ map simplify (a:b)



> simplify' (Star (Sum [])) = one

> simplify' (Star (Prod [])) = one

> simplify' (Star a) = Star (simplify a)

> simplify' a = a





lookup

fetch





> type Matrix s a = [((s,s),RE a)]

> type Vector s a = [(s,RE a)]



> fetch i x = maybe zero id $ lookup i x





solve

Star





> solve range i (m,v) = (

> [(j,simplify $ fetch (i,j) m <*> Star (fetch (i,i) m)) |

> j <- range, j/=i],

> simplify $ fetch i v <*> Star (fetch (i,i) m)

> )





substituteFor

solve





> substituteFor range k (v',c') m v = (

> [((i,j),(fetch j v' <*> fetch (i,k) m) <+> fetch (i,j) m) |

> i <- range, i/=k, j <- range, j/=k],

> [(i,(c' <*> fetch (i,k) m) <+> fetch i v) |

> i <- range, i /= k]

> )









> elim range s (m,v) = let (v',c') = solve range s (m,v)

> in substituteFor range s (v',c') m v









> bracket n (s,n') = if n'>=n then s else "\\(" ++ s ++ "\\)"

> pretty (Symbol a) = ([a],9)

> pretty (Star a) = (bracket 8 (pretty a) ++ "*",8)

> pretty (Prod a) = (concat $ map (bracket 7) $ map pretty a,7)

> pretty (Sum a) = (foldr1 (\a b -> a ++ "\\|" ++ b) $

> map (bracket 6) $ map pretty a,6)









> m n = [(((2*i) `mod` n,i),Symbol '0')|i <- [0..n-1]] ++

> [(((2*i+1) `mod` n,i),Symbol '1')|i <- [0..n-1]]

> v n = [(0,one)]









> reduce [a] (m,v) = solve [a] a (m,v)

> reduce (a:as) (m,v) = reduce as $ elim (a:as) a (m,v)

> test n = let (_,r) = reduce [n-1,n-2..0] (m n,v n) in simplify r









> main = putStrLn $ "^" ++ fst (pretty (test 7)) ++ "$"





Problem: Find a regular expression, compatible with GNU grep, that recognises strings of 1's and 0's that form the binary expansion of multiples of 7. (Consider the empty string to be zero.)Solution:Here's one (though not necessarily a good one):The standard approach to solving this problem is probably to construct a finite state automaton to compute the remainder modulo 7 of a binary string and then use the proof of Kleene's theorem to convert the automaton to a regular expression. But as I have some regular expression code left over from a previous post I thought I'd try a purely algebraic approach. Although it's ultimately equivalent to deriving a finite state automaton I thought it might be more fun to use intuitions about linear algebra to get to the solution. And anyway, drawing diagrams with HTML is hard...Consider first recognising multiples of 3. Assume we have three regular expressions, A, B and C. A corresponds to multiples of 3, B corresponds to numbers that are 1 modulo 3 and C matches numbers that are 2 modulo 3. Then we knowFirst a note about notation: I'm using boldandto represent binary digits but using regular 0 to represent the regular expression matching nothing and regular 1 to represent the regular expression matching the empty string. a+b means match a or b. So we can interpret the equation for A, say, as saying that a multiple of 3 is the empty string, or a multiple of 3 followed by a, or aafter one more than a multiple of 3. Similar interpretations apply for the other two equations. We can rewrite this aswhereandWe can now use our ordinary intuitions about solving linear systems. We want to solve for A so we need to eliminate B and C. Start by eliminating C. We haveIf we were doing ordinary linear algebra we could saybut for regular expressions we can neither subtract nor divide. However, arguing informally now, (1-a)=1+a+a²+a³+…=aso now we can writeAnyway, those are all the principles we need. Just eliminate each variable in turn by solving for it in terms of the other remaining variables and substituting the solution back into the remaining equations. You should be able to finish the job and solve for A. But you'll notice the work gets tedious for larger integers. That's where the code comes in.It starts as before but we can delete a whole lot of material:My simplification rules are still really poor. (Buggy, in fact, but not buggy in a way that leads to incorrect results.) That means we don't expect the final regular expressions to be in any way efficient. But I'm not trying to get the best solution, just a solution.I'm representing the vector v and matrix M above as simple associative arrays whose elements are found using. Elements not represented are assumed to be 0.handles the sparse lookup:implements exactly the solution method described above. The use ofcorresponds to my abuse of the binomial series for (1-a)above. However rather than manipulate equations I'm always working in terms of matrices and vectors over the regular expressions:substitutes a solution to an equation given byinto the remainder of the equations.As I describe above, a variable is eliminated by solving for it and then substituting:Now some code to output our regular expressions in a form compatible with grep:Now construct M and v for the equations describing the regular expressions for the classes of binary strings modulo n:To solve a complete set of equations we reduce it one equation at a time by eliminating variables:Print regular expression to detect multiples of 7:Try integers besides 7. But note that the expressions get more and more complex. And of course you can use the equation solving approach for problems besides recognising multiples of integers.PS I just noticed some of the indentation went wrong on translation to HTML. But the code appears to still compile fine with GHC.