To many people, “geek” and “nerd” are synonyms, but in fact they are a little different. Consider the phrase “sports geek” — an occasional substitute for “jock” and perhaps the arch-rival of a “nerd” in high-school folklore. If “geek” and “nerd” are synonyms, then “sports geek” might be an oxymoron. (Furthermore, “sports nerd” either doesn’t compute or means something else.)

In my mind, “geek” and “nerd” are related, but capture different dimensions of an intense dedication to a subject:

geek – An enthusiast of a particular topic or field. Geeks are “collection” oriented, gathering facts and mementos related to their subject of interest. They are obsessed with the newest, coolest, trendiest things that their subject has to offer.

– An enthusiast of a particular topic or field. Geeks are “collection” oriented, gathering facts and mementos related to their subject of interest. They are obsessed with the newest, coolest, trendiest things that their subject has to offer. nerd – A studious intellectual, although again of a particular topic or field. Nerds are “achievement” oriented, and focus their efforts on acquiring knowledge and skill over trivia and memorabilia.

Or, to put it pictorially à la The Simpsons:



Both are dedicated to their subjects, and sometimes socially awkward. The distinction is that geeks are fans of their subjects, and nerds are practitioners of them. A computer geek might read Wired and tap the Silicon Valley rumor-mill for leads on the next hot-new-thing, while a computer nerd might read CLRS and keep an eye out for clever new ways of applying Dijkstra’s algorithm. Note that, while not synonyms, they are not necessarily distinct either: many geeks are also nerds (and vice versa).

An Experiment

Do I have any evidence for this contrast? (By the way, this viewpoint dates back to a grad-school conversation with fellow geek/nerd Bryan Barnes, now a physicist at NIST.) The Wiktionary entries for “geek” and “nerd” lend some credence to my position, but I’d like something a bit more empirical…

“You shall know a word by the company it keeps” ~ J.R. Firth (1957)

To characterize the similarities and differences between “geek” and “nerd,” maybe we can find the other words that tend to keep them company, and see if these linguistic companions support my point of view?

Data and Method

(Note: If you’re neither a geek nor a nerd, don’t be scared by the math. It’s not too bad… or you can probably just skip to the “Results” subsection below…)

I analyzed two sources of Twitter data, since it’s readily available and pretty geeky/nerdy to boot. This includes a background corpus of 2.6 million tweets via the streaming API from between December 6, 2012, and January 3, 2013. I also sampled tweets via the search API matching the query terms “geek” and “nerd” during the same time period (38.8k and 30.6k total, respectively). Yes, yes, yes… I collected all the data six months ago but just now got around to crunching the numbers. It’s been a busy year!

A great little statistic for measuring how much company two words tend to keep is pointwise mutual information (PMI). It’s commonly used in the information retrieval literature to measure the cooccurrence of words and phrases in text, and it also turns out to be a good predictor of how humans evaluate semantic word similarity (Recchia & Jones, 2009) and topic model quality (Newman & al., 2010).

For two words w and v, the PMI is given by:

,

where in this case is the probability of the word(s) in question appearing in a random tweet, as estimated from the data. For instance, if we let v = “geek,” we compute the log-probability of a word w in the “geek” search corpus, and subtract the log-probability of w in the background corpus.

Results

The PMI statistic measures a kind of correlation: a positive PMI score for two words means they “keep great company,” a negative score means they tend to keep their distance, and a score close to zero means they bump into each other more or less at random.

With that in mind, here is a scatterplot of various words according to their PMI scores for both “geek” and “nerd” on different axes (ignoring words with negative PMI, and treating #hashtags as distinct):



Many people have asked for a high-res PDF of this plot, so here you go.

Moving up the vertical axis, words become more geeky (“#music” → “#gadget” → “#cosplay”), and moving left to right they become more nerdy (“education” → “grammar” → “neuroscience”). Words along the diagonal are similarly geeky and nerdy, including social (“#awkward”, “weirdo”), mainstream tech (“#computers”, “#microsoft”), and sci-fi/fantasy terms (“doctorwho,” “#thehobbit”). Words in the lower-left (“chores,” “vegetables,” “boobies”) aren’t really associated with either, while those in the upper-right (“#avengers”, “#gamer”, “#glasses”) are strongly tied to both. Orange words are more geeky than nerdy, and blue words are the opposite. Some observations:

Collections are geeky. All derivatives of the word “collect” (“collection,” “collectables”, etc.) are orange. As are “boxset” and “#original,” which imply a taste for completeness and authenticity.

All derivatives of the word “collect” (“collection,” “collectables”, etc.) are orange. As are “boxset” and “#original,” which imply a taste for completeness and authenticity. Academic fields are nerdy: “math”, “#history,” “physics,” “biology,” “neuroscience,” “biochemistry,” etc. Other academic words (“thesis”, “#studymode”) and institutions (“harvard”, “oxford”) are also blue.

“math”, “#history,” “physics,” “biology,” “neuroscience,” “biochemistry,” etc. Other academic words (“thesis”, “#studymode”) and institutions (“harvard”, “oxford”) are also blue. The science & technology words differ. General terms (“#computers,” “#bigdata”) are on the diagonal — similarly geeky and nerdy. As you splay up toward more geeky, though, you see products, startups, brands, and more cultish technologies (“#apple”, “#linux”). As you splay down toward more nerdy you see more methodologies (“calculus”).

General terms (“#computers,” “#bigdata”) are on the diagonal — similarly geeky and nerdy. As you splay up toward more geeky, though, you see products, startups, brands, and more cultish technologies (“#apple”, “#linux”). As you splay down toward more nerdy you see more methodologies (“calculus”). #Hashtags are geeky. OK, sure, hashtags are all over the place. But they do tend toward the upper-left. And since hashtags are “#trendy,” I take it to mean that geeks are into trends. (I take this one back. The average PMI score for all hashtags is 0.74 with “geek” but 0.73 with “nerd.” The difference isn’t statistically significant using a paired t-test or Wilcoxon test, or practically significant using a common-sense test.)

(I take this one back. The average PMI score for all hashtags is 0.74 with “geek” but 0.73 with “nerd.” The difference isn’t statistically significant using a paired t-test or Wilcoxon test, or practically significant using a common-sense test.) Hobbies: compare the more geeky pastimes (“#toys,” “#manga”) with the more nerdy ones (“chess,” “sudoku”).

compare the more geeky pastimes (“#toys,” “#manga”) with the more nerdy ones (“chess,” “sudoku”). Brains: the word “intelligence” may be geeky, but “education,” “intellectual,” and “#smartypants” are nerdy.

the word “intelligence” may be geeky, but “education,” “intellectual,” and “#smartypants” are nerdy. Reading: “#books” are nerdy, but “ebooks” and “ibooks” are geeky.

“#books” are nerdy, but “ebooks” and “ibooks” are geeky. Pop culture vs. high culture: “#shiny” and “#trendy” are super-geeky, but (curiously) “cellist” is the nerdiest…

The list goes on. If you want to poke around yourself, download the raw PMI scores (4.2mb) and let me know in the comments what you find. Since many people have asked: I computed PMI for all words appearing in the search tweets with “geek” and “nerd” (millions) and then manually scanned roughly 7,500 words with positive PMI scores for both. The scatterplot contains about 300 words that I hand-picked because they made sense.

(Update: I learned that Olivia Culpo — a self-described “cellist nerd” — was crowned Miss Universe on December 20, 2012. The event was heavily tweeted smack in the middle of my data collection, so that probably explains the correlation between “cellist” and “nerd” here. It also underscores the limitations of time-sensitive data.)

Conclusion

In broad strokes, it seems to me that geeky words are more about stuff (e.g., “#stuff”), while nerdy words are more about ideas (e.g., “hypothesis”). Geeks are fans, and fans collect stuff; nerds are practitioners, and practitioners play with ideas. Of course, geeks can collect ideas and nerds play with stuff, too. Plus, they aren’t two distinct personalities as much as different aspects of personality. Generally, the data seem to affirm my thinking.

I wonder how similar the results would be if you applied this method to the Google Books Ngrams corpus, or something more general instead of a niche media like Twitter. I also wonder what other questions might be answered with this kind of analysis (for example, my wife and I have a perennial disagreement over which word is wetter: “moist” vs. “damp.”).

Finally, when I mentioned to a friend that I was going to write up this post, she said “Well, I guess we know which one you are.” But do we really? I may be a science nerd, but I’m probably a music geek…

Update (June 25, 2013): Woah. This has gotten more attention than I ever anticipated. A few impressions. (1) Prior to writing this, I had no idea there was a “geek vs. nerd” holy war in certain corners of the Internet; fueling these flamewars was certainly not my intent. Lighten up! (2) I fear I’ll be better known for this diversion than for any of my “real” research. To be clear: this was a fun way to kill a few hours on a Saturday afternoon, not necessarily my best science. I think the writeup here is sound and self-evident, but I’m the first to acknowledge that there are better corpora, methods, and analysis techniques — which could use a grant, grad student, and/or more than an afternoon — for uncovering this all-important “Truth.” (3) For those interested in the etymologies of “geek” and “nerd,” I found this cool writeup.