Another step in my Automagic Poetry Generation project.

when i think we have tickets we't my abortion i drank it't my best friend melissa mahoney oh we tried to the building ~The Markov Chain, based on Amanda Palmer's Oasis

I'm making an evolutionary algorithm to generate poetry, but it needs a good base to start from. Random data is preferred, but being completely random means you waste the first couple of epochs just getting to something solid to work from - that's a waste of time.

To make everything a bit smoother, I've written a markov chain generator in Haskell that generates the initial population. The principle is very simple:

Read seed poem from file Tokenize Train markov chain A generator spits out an infinitely long list based on the markov chain Take X tokens Detokenize Done

Simple.

A quick brush up for people who don't breathe "hardcore" Comp Sci stuff: A markov chain is essentially a probabilistic state automata, usually represented with a probability matrix. You take the current state and translate it into the next state by observing the proper probabilities.

The output is a random string that looks a lot like some drunk tweets you might see on a friday night.

Our main function is a simple convolution of a bunch of things:

start_population::(RandomGen g) => g -> String -> IO String start_population gen start = do return . (foldr detokenize "") . (take Config.seed_length) . (produce gen start). chain . tokenize =<< readFile Config.seed_data

Reading it from right to left you can see that it first reads some data, tokenizes it, makes the chain, produces some output, cuts it to the proper length and then shoves it back into a normal string.

Simple.

Because I wanted to handle punctuation and new lines, which are important in poems, I had to write my own tokenization and detokenization functions. Otherwise the built in words function would be sufficient.

tokenize::String -> [String] tokenize s = Prelude.filter (\x -> x /= " " && x /= "") $ Split.split (whenElt (\x -> isSeparator x || isPunctuation x || x == '

')) $ Prelude.map toLower s detokenize::String -> String -> String detokenize a b | punctuation a || punctuation b = a++b | otherwise = a++" "++b where punctuation = (\x -> length x > 0 && isPunctuation (x!!0))

You can see that tokenize splits on pretty much everything and detokenize takes special care not to put spaces around punctuation.

Another important step is building the chain itself.

chain::[String] -> Map String [String] chain [now, last] = insert now [last] $ singleton last [] chain (token:xs) = insertWith (

ew old -> new++old) token [xs!!0] $ chain xs

Simply put - this function builds a HashMap from a token to many tokens. The idea here is to make a note of every token that comes after some other token. To make things simpler, if a pair of tokens happens twice, it will be recorded twice.

This magically gives us the ability to properly weigh the random function that chooses what to generate next.

next_token::(RandomGen g) => g -> Map String [String] -> String -> (g, String) next_token gen map s = let choices = findWithDefault [] s map (i, gen') = randomR (0, length choices - 1) gen in (gen', choices!!i) produce::(RandomGen g) => g -> String -> Map String [String] -> [String] produce gen s map = let (gen', next) = next_token gen map s in s:(produce gen' next map)

I have a nasty suspicion the next_token and produce functions could be merged, but I found this easier to reason about.

Next_token is the meat of our algorithm - it does nothing but take a token, find a list of its possible successors in the HashMap and return a random member of that list. To avoid any issues it will return an empty string if nothing is found.

The produce function takes care of driving next_token and makes sure it gets a fresh random generator every time.

A problem with my technique is that once you give a random generator to the markov chain, you're not getting it back. While it does ensure the result will always be fresh, you might be using a stale generator in other parts of your code if you're not careful.

Maybe I should finally look into that random monad I've been hearing about.

Either way, here's the full code, which is 38 sloc because I like including the function headers - makes code easier to think about, but I've tried and it does work without any type hints. Haskell is smarter than I am. Plus I added the whole part that only exposes start_population to the outside world, which isn't otherwise necessary.

module Initiators.MarkovChain ( start_population ) where import System.Random import Data.HashMap import Data.List.Split as Split import Data.Char import Config -- read corpus data -- build markov chain -- spit out data start_population::(RandomGen g) => g -> String -> IO String start_population gen start = do return . (foldr detokenize "") . (take Config.seed_length) . (produce gen start). chain . tokenize =<< readFile Config.seed_data tokenize::String -> [String] tokenize s = Prelude.filter (\x -> x /= " " && x /= "") $ Split.split (whenElt (\x -> isSeparator x || isPunctuation x || x == '

')) $ Prelude.map toLower s detokenize::String -> String -> String detokenize a b | punctuation a || punctuation b = a++b | otherwise = a++" "++b where punctuation = (\x -> length x > 0 && isPunctuation (x!!0)) chain::[String] -> Map String [String] chain [now, last] = insert now [last] $ singleton last [] chain (token:xs) = insertWith (

ew old -> new++old) token [xs!!0] $ chain xs next_token::(RandomGen g) => g -> Map String [String] -> String -> (g, String) next_token gen map s = let choices = findWithDefault [] s map (i, gen') = randomR (0, length choices - 1) gen in (gen', choices!!i) produce::(RandomGen g) => g -> String -> Map String [String] -> [String] produce gen s map = let (gen', next) = next_token gen map s in s:(produce gen' next map)

Did you enjoy this article? 👎 👍

Published on September 21st, 2012 in Amanda Palmer, Arts, Haskell, Markov chain, Randomness, Uncategorized

Learned something new?

Want to become a high value JavaScript expert? Here's how it works 👇 Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet 📖right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths. Start with an interactive cheatsheet 📖 Then get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career. "Man, love your simple writing! Yours is the only email I open from marketers and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌" ~ Ashish Kumar Your Name Your Email Your Address Subscribe & Become an expert 💌 Join over 10,000 engineers just like you already improving their JS careers with my letters, workshops, courses, and talks. ✌️

Have a burning question that you think I can answer? I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

Ready to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️