At this year’s Canadian Congress of Humanities and Social Sciences I had the pleasure of attending a talk by David McClure in the digital humanities strand on his visualization tool, TextPlot.

I won’t go into the technical details of what TextPlot does, because David has done so adeptly in a series of posts on his blog. Essentially, though, the tool allows for the creation of a force-directed graph of the top n terms in a text after computing a probability density function (using kernel density estimation) filtered by Bray-Curtis dissimilarity. This results in a map of clustered and connected terms within a specific distance; which terms occur “together” within a specified distance and are, therefore, most closely connected.

Immediately after the session I decided to play around with this for Thomas Pynchon’s epic, 1973 novel, Gravity’s Rainbow. The resultant Gephi visualization of the top 1,000 terms (there are some problems with accents cutting some words short, like “Peenemünde”) looks like this:

You can also click the image to explore the network using the zoomable image viewer that David also built.

Critical observations

There are several features of this network map that are worthy of comment:

Perhaps most interestingly, it strikes me that, on a first pass, the clustering in this network focuses on areas to which there has been substantial critical attention within the text. Roger and Jessica; the Anubis; Rocketman; Weissman, Pökler and Ilse; Katje and the octopus, Grigori; Enzian and Tchitcherine; Byron the Bulb. In this way, the algorithm correct identifies many of the scenes, amid this convoluted novel, that critics have deemed important. There are, however, areas that have had critical attention that are not here well modelled: perhaps, most notably, the launch of the 0000; Slothrop’s disintegration.

Semantic fields within the novel

I want to do much more playing with this to ascertain whether different parameters (and even different algorithmic approaches to plotting density/clustering) yield wildly different results.

However, it’s worth also closing with a few thoughts about why some episodes in this novel are clearly distinct in this visualization technique while others are integrated and dispersed. The method that TextPlot uses to generate its data is based on linguistic linkage. As David puts it:

I was thinking about the way that words distribute inside of long texts – the way they slosh around, ebb and flow, clump together in some parts but not others. Some words don’t really do this at all – they’re spaced evenly throughout the document, and their distribution doesn’t say much about the overall structure of the text. This is certainly true for stopwords like “the” or “an,” but it’s also true for words that carry more semantic information but aren’t really associated with any particular content matter. […] Other words, though, have a really strong semantic focus – they occur unevenly, and they tend to hang together with other words that orbit around a shared topic.

Gravity’s Rainbow has been assessed by many readers as a text that works to generate a feeling, among its readers, that everything may be connected (as a form of conspiratorial plot) and that, therefore, it might equally be the case that nothing is connected. Pynchon terms these paranoia (total connectedness) and anti-paranoia (utter disconnect).

An initial plot of the text in this way allows us to start to consider whether the text constructs particular linguistic and semantic fields around particular parts of the text. Specific terms clearly occur in isolated contexts. The octopus rarely returns; most of Roger’s narrative is centred around his pairing and unpairing with Jessica; Byron the Bulb is far out in his own diegetic layer with distinct terms that rarely recur.

Other terms, though, seem scattered across the Zone of the text. As an initial hypothesis that I need to explore much more: it could be that many of the isolated action segments of the text, to which critics have turned their thematic and historical attentions, may share common linguistic cores (semantic fields) with many other parts of the text. This might begin to contribute to their ultimate connectedness within the novel.