Suppose we’re interested in looking at past-tense verbs. The most common examples in COCA are nondescript: “said,” “came,” “got,” “went,” “made,” “took” and so on. On the surface, the fiction offerings aren’t that different: “said” is still the big winner, while some others move up the list a few spots, like “looked,” “knew” and “thought.” But ask COCA which past-tense verbs show up more frequently in fiction compared with, say, academic prose, and things start to get interesting: the top five are “grimaced,” “scowled,” “grunted,” “wiggled” and “gritted.” Sour facial expressions, gruff noises and emphatic bodily movements (wiggling fingers and gritting teeth) would seem to rule the verbs peculiar to today’s published fiction.

Beyond the use of individual words, researchers can uncover even more striking patterns by looking at how words combine with their neighbors, forming “collocations.” Dictionary makers take a special interest in high-frequency collocations, since they can be the key to understanding how words work in the world. It’s a particular boon for making dictionaries that appeal to learners of English as a second language. When the lexicographer Orin Hargraves was studying collocations for a project at Oxford University Press (where I previously worked as editor for American dictionaries), he struck upon a trove of collocations that “would not be statistically significant were it not for their appearance in fiction.” And these weren’t just artifacts of genre fiction, like “warp speed” in sci-fi or “fiery passion” in bodice-ripping romance novels.

Using the Oxford English Corpus, encompassing about two billion words of 21st-century English, Hargraves found peculiar patterns in simple words like the verb “brush.” Everybody talks about brushing their teeth, but other possible companions, like “hair,” “strand,” “lock” and “lip,” appear up to 150 times more frequently in fiction than in any other genre. “Brush” appears near “lips” when two characters’ lips brush against each other or one’s lips brush against another’s cheek — as happens so often in novels. For the hair-related collocations, Hargraves concludes that “fictional characters cannot stop playing with their hair.”

“Bolting upright” and “drawing one’s breath” are two more fiction-specific turns of phrase revealed by the corpus. Creative writers are clearly drawn to descriptive idioms that allow their characters to register emotional responses through telling bits of physical action — “business,” as they say in theater. The conventions of modern storytelling dictate that fictional characters react to their worlds in certain stock ways and that the storytellers use stock expressions to describe those reactions. Readers might not think of such idioms as literary clichés, unless they are particularly egregious. Individual authors will of course have their own idiosyncratic linguistic tics. Dan Brown, of “Da Vinci Code” fame, is partial to eyebrows. In his techno-thriller “Digital Fortress,” characters arch or raise their eyebrows no fewer than 14 times.

Brown’s eyebrow obsession may simply signal a lack of imagination, but corpus research can also illuminate a writer’s stylistic creativity. Masahiro Hori, a professor of English linguistics at Kumamoto Gakuen University in Japan, has studied how Charles Dickens breathed new life into literary collocations. In “The Pickwick Papers,” for instance, Dickens played off the idiom “to look daggers at someone” (meaning to shoot a wrathful glare, itself descended from Shakespeare’s “to speak daggers”) by innovatively replacing “daggers” with “carving-knives”: an old lady “looked carving-knives at the hardheaded delinquent.” To be sure, a careful reader might have discerned the originality of the phrase on his own, but corpus analysis allowed Hori to confirm and extend his insights into Dickens’s ­originality.