Recently, for no reason other than my own amusement, I decided to create a Twitter bot called horse_bluegrass which generates random text from a text predictive engine trained solely on the lyrics of 1,796 bluegrass, old-time, and classic country songs. The results are quite amusing: some sound like realistic lyrics that could be used in song lyrics; others result in non-sensical mess. Interesting? Stupid? Nonsense? I’ll let you be the judge; but first I’d like to quickly introduce how the text gets generated.

The code (via mispy/twitter_ebooks) takes text, parses it into individual words, to create a model where the algorithm knows the likeliness that one word will follow another or end a phrase. For instance starting with the word “in” it knows that a likely word to follow will be “the”, “a”, or 43 other different words. The algorithm decides to go with “the” due to the statistical likeliness and randomness. It then continues and chooses the next word after “the” using the same process… and so on until the algorithm decides the phrase should end. Once it has a complete phrase, it publishes the text to Twitter.

Note: I didn’t investigate this too much; however I believe this is a Markov chain. I also didn’t want to get too technical here but did want to give a quick overview how the text is being generated.

To get the training text, I wrote a web scraper which took all the songs from http://www.bluegrasslyrics.com/ and outputted the song title and lyrics into this single text file.

Once I had the text file, a whooping 1.3 megabytes and 37,887 lines, I trained the bot, set it tweet out every so often, sent the process into the background on my server, then scurried up to Harrisburg to watch The Travelin’ McCourys play some of that great human-generated bluegrass music.

With much delight, it’s first generated text was the following introduction — which to me sounds like something you’d hear on an old live Bill Monroe recording:

*ahem*.. mic drop..

So far, the bot has produced phrases that touch upon the subject matter of the lyrics it was trained with quite well: love, loss, death, heartache, joy, religion, suffering, etc... It’s my hope that maybe something from this will spark a song from a songwriter or otherwise just give anyone insight into how random computer-generated content can still end up being profound.

Here are some of my favorites:

that sounds fun!

classic subject of old timey love

this could actually belong in a gospel song

hell yeah horse_bluegrass, you are the man!

not sure what this means, but damn it sounds cool

yeah… that always seems to happen in old songs

of course we can’t

so sad

more sadness

found this one really funny due to the change in frequency

more crying

quite hilarious mashup between two songs!

just let it be known..

more sadness

sadness with a weird twist ending

combine this with top one and it ends up kinda happy

I feel this phrase happens all the time in old songs

a nice twist on Long Black Veil

this is just a lovely phrase

um

uhhh.. no comment

some happiness!

loud music!

more loving gospelgrass

Tillie has to want to be happy

all over

has the makings of a good song

To continue to saga of horse_bluegrass lyrics, feel free to check out https://twitter.com/horse_bluegrass .

I’m going to leave it on generating phrases (once an hour for now, but it will later be more sparse in a few days).

If you have any questions or amusing ideas about this feel free to respond here or hit me up on Twitter at @jwenerd.