(4) The Bot of Mormon: I don't usually do in-depth analyses of my bots, especially one that's probably not gonna break ten followers, but my most recent bot is very personal to me, and the making of it turned out to be much stranger than I expected. It's The Bot of Mormon, "the most correct bot", a text-generating process with a very niche audience but the niche audience includes me, so I'm happy. A few of my recent favorites: And again I say unto you, and more especially the elephants and cureloms and cumoms. — The Bot Of Mormon (@TheBotOfMormon) October 16, 2014 A large and tough businessman, I pray only that I might always be found as Abraham Lincoln said: "Die when I may, by a wild olive tree." — The Bot Of Mormon (@TheBotOfMormon) October 16, 2014 "As we read in the Book of Mormon, but I will have him come to the phone." — The Bot Of Mormon (@TheBotOfMormon) October 14, 2014 A note: In a bid for more followers, as well as not alienating all my relatives, I designed the Bot of Mormon to be a bit of harmless humor for believing LDS folk (early versions could be pretty offensive, and I chose not to go that route). However, Saints might take offense at this blog post about how and why I made the bot. So, fair warning. Here we go. It's not much of an exaggeration to trace my interest in generative text back to my experience growing up in Mormonism. Mark Twain famously called the Book of Mormon "chloroform in print", and I believe the reason it's so boring is that it was produced by a process similar to automatic writing. It's full of stalling and retreats to stock phrases. But what starts with the Book of Mormon sure doesn't end there. When I was a kid, church every week was a three-hour festival of stock phrases and repetition. See, in the LDS church the task of coming up with things to say every week rotates around the general membership. Topics are assigned, and there are only about fifty topics total. Since every acceptable topic has been covered a million times before, the simplest way to make a new talk is to remember bits of old talks and mash them together. When I was a kid I experienced this from both ends, and writing the talks was especially intense for me because despite my best efforts, I didn't actually believe. My talks were literally constructed by assembling meaningless symbols into patterns that matched what I saw other people doing. Naturally, ever since I caught the botmaking bug I've wanted to recreate this experience with a bot. I registered @TheBotOfMormon quite a while ago. But I couldn't figure out what to do until recently, when I hit upon the idea of taking as my corpus not the Book of Mormon itself, but the General Conference talks. General Conference is a big twice-yearly event in Salt Lake where the top brass show y'all how it's done. These guys used to be lawyers and corporate executives, and their talks are all vetted by committee, so the result is... well, sometimes someone will say something offensive, but even that I wouldn't call "interesting". What is interesting is that Conference is where Mormonism meets the twenty-first century. By which I mean that's where you can see the pros use nineteenth-century language and rhetoric to talk about same-sex marriage (undesirable!) and the Internet (a mixed bag!) That's the kind of juxtaposition I thought would make a good bot. As it turns out, I was right... sort of. Eventually. To give you a picture of what goes on in General Conference, here's a table I made of the top ten topics by decade, according to the keywords in the <meta> tags for each talk. 1970s 1980s 1990s 2000s 2010s obedience missionary work spirituality testimony Jesus Christ welfare priesthood family plan of salvation youth Jesus Christ missionary work service obedience priesthood faith love family spirituality adversity Jesus Christ faith family priesthood love service Holy Ghost obedience prayer Atonement faith Jesus Christ service testimony obedience family Holy Ghost prayer love priesthood Jesus Christ service faith priesthood obedience adversity family love Holy Ghost Atonement You can see the shape of the fifty acceptable topics there. Anyway, I downloaded the Conference talks and set about applying my usual bag of tricks to the corpus to come up with an interesting transformation. Imagine my surprise when none of my techniques worked! The _ebooks algorithm, up to this point an unending generator of hilarity from any corpus, failed miserably. The word-frequency filter I used to find the interesting signs for Minecraft Signs, also failed. Markov chains were useless, big surprise. I had a dim idea that the key to bot gold here was the subordinate clauses: the sentences that run on and on in a lawyerly way, embroidering themselves with their own Talmudic interpretations. I tried Queneau assembly of sentences at the clause level. This was good enough to get the bot launched, but it wasn't great. Each individual clause is very likely to be boring, its boringness has no relationship to word frequency, and combining clauses doesn't help. The corpus is fractally boring. "Here you will find happiness, we know that the rejoicing, or anything else, they are in a state contrary to the nature of happiness." — The Bot Of Mormon (@TheBotOfMormon) October 2, 2014 Okay, I thought, time to break out the big guns. I incorporated the Book of Mormon into my corpus, the Doctrine & Covenants; even the Pearl of Great Price, the bizarro crown jewel of the LDS canon. None of it helped. (The Pearl of Great Price helped a little—it's really weird—but it's also very short.) Behold, and began to put heavy burdens upon their backs, and prayers of faith. — The Bot Of Mormon (@TheBotOfMormon) October 6, 2014 But legend told of a secret weapon: the Journal of Discourses. Basically a large collection of General Conference talks from the late 19th century, during the polygamy era, containing a ton of fiery rhetoric and juicy doctrines downplayed or outright disowned by the modern church. Some might consider it dirty pool, but I was desperate to get some interesting content out of my bot. I Queneau-ified every Discourse in the Journal and added it to the corpus... to no avail. It was still dull! On the sentence fragment level, it's tough to even distinguish between the 'scandalous' stuff in the Journal and the dishwater they serve up at Conference nowadays. And now behold, as it were, most of them in environments very different from their own. — The Bot Of Mormon (@TheBotOfMormon) October 9, 2014 At this point I was so frustrated that I honestly started to question my unbelief. What are the odds that a corpus of text spanning hundreds of authors over nearly 200 years could be so uniformly dull? Was some divine hand at work, keeping things from getting too interesting? With shaking hands I ran my tests against a control sample: the Gutenberg text of a non-Mormon book of sermons. And it turns out nineteenth-century religious language is what's fractally boring. It's nothing to do with Mormonism in particular. The modern stuff is dull because it copies and recombines the nineteenth-century stuff. And that, finally, was the key to what little success I've achieved with @TheBotOfMormon. When the bot is funny, the funny thing is not the rambling juxtaposition of sentence fragments per se. It's the juxtaposition of modern concepts with nineteenth-century language. To get the bot to work I would have to actually recreate that juxtaposition, not just hope for it. Enter the Corpus of Historical American English. (Thanks, BYU! Seriously, what a great project.) This has word frequencies for every decade from the 1810s up to 2009. I picked out all the words that were 10x more common between 1930 and 1980 as they were between 1830 and 1880. I tagged all the sentence fragments that were distinctly twentieth-century. Now I can guarantee that every assemblage has an old-timey component and a more modern component, and the chances of humor go way up. The lesson I want to take from this is that every corpus is different. I thought I could handle the LDS corpus with the same tools I use on Gutenberg, because they're both full of archaic language, but I was totally wrong. Once I engaged with the text this became obvious, but I came into this holding the text at arms' length because it held a lot of bad childhood memories. There's no generic bot kit that will work on anything. (Well, there is, but it uses Markov chains and I don't like it.) Even my really simple bots like I Like Big Bot and Boat Names required a lot of custom behind-the-scenes work to find the most interesting subset of the data. Perhaps this can serve as my new rule. A new bot needs to present a different way of being a bot, not just a different corpus. And adding more text to a corpus I don't know how to handle just makes the problem worse.