Markov chains are a simple concept. Start with a state (e.g. a word, a location on a map, a musical note) and move to another one at random. Repeat until you have a chain of states that you can use for something. The key is that you don’t choose randomly from all possible states, only from those that have some probability of following the current state.

For example, suppose you know that a sentence starts with the word “She.” It’s reasonable to bet on the next word being “is” or “was.” It’s possible (but less likely) to see less common words such as “asserted” or “obliterated.” Most likely your text doesn’t contain instances of phrases like “She chemistry” or “She absolutism.” If that’s the case, those words have zero probability of being chosen.

Let’s take the following text as an example:

He is a boy. She is a girl. He is young. She is younger.

Here is a list of all the transitions we see in the sentences from this text:

{"She" ("is" "is"), "a" ("boy." "girl."),

"is" ("a" "a" "young." "younger."),

"He" ("is" "is"), "*START*" ("He" "She" "He" "She")}

Note that the word “is” has four words that it could transition to. Two of them are “a” so if we start generating random sentences, “is” will be followed by “a” half the time. Let’s choose a word that follows the *START* state (nothing) and continue until we find a word with a period. Here are some sentences we could see:

"He is younger." "She is a girl." "She is a boy." "He is a boy."

Ok, time to automate this with some Clojure code (or skip to the “generate gibberish” button below if you’re not a programmer):

The first function generates a map of transitions like the one shown above. The second one chooses words from the map until it finds a word that ends in a period. It’s easy to compile this into JavaScript via ClojureScript. I’ve done that, check out the full code on Github. You can test it below: click the “generate gibberish” button and check the output. The input box contains Blowin’ in the Hamlet by Bob Shakespeare. Try replacing that with a few paragraphs of your own.



How many roads most a man walk down. Before you call him a man. How many seas must a white dove sail. Before she sleeps in the sand. Yes, how many times must the cannon balls fly. Before they’re forever banned. The answer my friend is blowin’ in the wind. The answer is blowin’ in the wind. Yes, how many years can a mountain exist. Before it’s washed to the sea. Yes, how many years can some people exist. Before they’re allowed to be free. Yes, how many times can a man turn his head. Pretending he just doesn’t see. The answer my friend is blowin’ in the wind. The answer is blowin’ in the wind. Yes, how many times must a man look up. Before he can see the sky. Yes, how many ears must one man have. Before he can hear people cry. Yes, how many deaths will it take till he knows. That too many people have died. The answer my friend is blowin’ in the wind. The answer is blowin’ in the wind. How many roads most a man walk down. Before you call him a man. How many seas must a white dove sail. Before she sleeps in the sand. Yes, how many times must the cannon balls fly. Before they’re forever banned. The answer my friend is blowin’ in the wind. The answer is blowin’ in the wind. Yes, how many years can a mountain exist. Before it’s washed to the sea. Yes, how many years can some people exist. Before they’re allowed to be free. Yes, how many times can a man turn his head. Pretending he just doesn’t see. The answer my friend is blowin’ in the wind. The answer is blowin’ in the wind. Yes, how many times must a man look up. Before he can see the sky. Yes, how many ears must one man have. Before he can hear people cry. Yes, how many deaths will it take till he knows. That too many people have died. The answer my friend is blowin’ in the wind. The answer is blowin’ in the wind. To be, or not to be: that is the question: Whether ’tis nobler in the mind to suffer. The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles. And by opposing end them? To die: to sleep. No more; and by a sleep to say we end. The heart-ache and the thousand natural shocks. That flesh is heir to, ’tis a consummation. Devoutly to be wish’d. To die, to sleep. To sleep: perchance to dream: ay, there’s the rub. For in that sleep of death what dreams may come. When we have shuffled off this mortal coil. Must give us pause: there’s the respect. That makes calamity of so long life. For who would bear the whips and scorns of time. The oppressor’s wrong, the proud man’s contumely. The pangs of despised love, the law’s delay. The insolence of office and the spurns. That patient merit of the unworthy takes. When he himself might his quietus make. With a bare bodkin? who would fardels bear. To grunt and sweat under a weary life. But that the dread of something after death. The undiscover’d country from whose bourn. No traveller returns, puzzles the will. And makes us rather bear those ills we have. Than fly to others that we know not of? Thus conscience does make cowards of us all. And thus the native hue of resolution. Is sicklied o’er with the pale cast of thought. And enterprises of great pitch and moment. With this regard their currents turn awry. And lose the name of action.– Soft you now! The fair Ophelia! Nymph, in thy orisons. Be all my sins remember’d.

Generate gibberish

Output:

——-

Side note: the code above is the simplest algorithm I came up with for generating a Markov chain out of a few paragraphs, but it doesn’t scale for large amounts of text. If you try running it against a book, it will run out of memory. In a follow-up post I’ll show how to put together a solution that scales better. It will be more about Clojure than about Markov Chains.