The Internet’s been abuzz with hype this week about a new computer study by Duane and Chris Johnson which purports to identify four literary influences on the Book of Mormon (besides the King James Bible):

What’s really impressive about the Johnson and Johnson study is its scope and methodological sophistication. The study compares the Book of Mormon with a sample of more than 130,000 scanned texts published between 1500 and 1830. This sample is decidedly imperfect, containing as it does a large number of OCR errors and long S‘s (which the algorithm can’t recognize), but its size is impressive, and the authors seem determined to continue working on their algorithm to compensate for its shortcomings. The methodology the authors have designed to analyze this sample isn’t perfect either, but it’s more exacting than some online detractors have given it credit for. According to the authors, the method has been refined over the course of thirty-six experiments and is continuing to be improved.

Basically, the methodology works like this. First, the algorithm measures the number of four-word phrases that each text shares with the Book of Mormon. (The authors call these four-word phrases n-grams.) A high number of shared n-grams (relative to total word-count) indicates that a text may have some kind of significant literary relationship to the Book of Mormon. (N-grams found in the King James Bible are excluded from the analysis to prevent mutual Bible-borrowing from influencing the results.) An additional analysis, called “Iterative Source Separation” (ISS), then measures the distinctiveness of the shared n-grams. The more distinctive n-grams a text shares with the Book of Mormon, the more literary dependence is indicated. The four texts listed above are the ones that seem to share a statistically significant number of distinctive phrases with the Book of Mormon.

I do see a couple problems with this methodology. First, it seems to me that if ISS really works as advertised, then it would not be necessary to exclude King James Bible phrases from the analysis. In fact, putting the KJV back into the analysis could be a useful way to check the effectiveness of ISS. (Update 10/29: A note on the Johnsons’ website indicates that the KJV has been put back into the analysis, so this issue has been resolved.)

Second and more importantly, I’m bothered by the fact that both The Late War and The First Book of Napoleon show up as statistically significant influencers of the Book of Mormon. Johnson and Johnson speculate that The First Book of Napoleon influenced The Late War, and that the former’s influence may have been mediated to the Book of Mormon via the latter. One would think, however, that ISS would prevent such mediated influence from showing up as statistically significant in the results. If the ISS analysis works as advertised, then it appears that Late War and Napoleon each share distinctive content with the Book of Mormon that they do not share with each other. I think it unlikely that Joseph Smith had read and been influenced by both of these highly unusual texts (which are histories of modern events written in pseudo-biblical prose). It seems more likely that the distinctive nature of what these authors were doing—writing pseudo-biblical narratives—led them to independently invent some of the same distinctive phrases and/or to independently mutate biblical phrases in some of the same distinctive ways. Since the Koran translation and David Willson’s The Rights of Christ also simulate King James prose, these texts may share distinctive material with the Book of Mormon for the same reason.

One finding of the study I do think is very interesting is that Solomon Spalding’s Manuscript Found and Mercy Otis Warren’s History of the American Revolution (two books often touted as sources for the Book of Mormon) are really no more similar to the Book of Mormon than the average book. Ethan Smith’s View of the Hebrews is a better match, but still within the normal statistical distribution for books of the period. Of course, that doesn’t necessarily mean Joseph Smith wasn’t influenced by these books. It may simply mean that shared four-word phrases aren’t good indicators of influence. It seems noteworthy in at least the case of Mercy Otis Warren’s book, though, because the case for that book having influenced the Book of Mormon has usually rested almost entirely upon lists of shared phrases collated by zealous researchers. Johnson and Johnson’s study reveals that Warren’s History doesn’t actually share more phrases with the Book of Mormon than other books of the period. Researchers, then, must have simply noticed more shared phrases in Warren’s book because it just happens to be one to which they’ve devoted a lot of energy.

[Edit to add: Actually, it occurs to me that the Johnson and Johnson method may be systematically biased toward obscure writings, whereas “mainstream” writings like Warren’s or Ethan Smith’s may be contraindicated because they were popular. Mainstream writings are often quoted and imitated, making less of their text appear distinctive to the algorithm. There are therefore relatively fewer opportunities for a distinctive phrase match with a mainstream work than with an obscure one. To correct for this, you’d need to count shared n-grams relative to the total number distinctive n-grams in a work rather than relative to its total word-count.]

Overall, my assessment of the Johnson and Johnson study is similar to my assessments of most other computer studies of Book of Mormon authorship: it’s intriguing and important, but also somewhat limited and not entirely persuasive. As I’ll suggest in a follow-up post, the study’s findings may be more useful for purposes that were not in its authors’ purview than for answering the questions they were asking.