Late one evening I was lazily flipping through articles when I was caught by the headline, Why There’s No Conservative Jon Stewart. The article is principally concerned with why things are funny, and to whom they are funny. Within the story was a theory I had never heard of: The Benign Violation Theory of Humor offered by Dr. Peter McGraw at the University of Colorado Leeds School of Business. The theory is given as:

…humor results from violating social norms or by violating a particular person or group. But it only becomes funny when it’s placed in a second context that clearly signals the violation is harmless or benign. In other words, if someone falls down the stairs, it will only be really funny if that person doesn’t get hurt.

Upon reading that explanation, something in my head changed. I could feel it. It was instant and permanent. I saw humor in a new way. It felt intuitive. It was powerful. I could reason why everything was funny — from slapstick, to Sarah Silverman, to Ann Coulter (with a laugh track she would be hysterical, and I felt I could explain why!) Even the humor in puns fit: the meaning of the word is disturbed (violation), but the phrase still makes sense (ultimately benign).

From: University of Chicago Leeds Business School, Benign Violation Theory

It was all I could talk about for days. Imagine, I thought, if humor could be mechanized; if machines could be trained to classify content into “violating” and “benign” and generate novel humor: what if computers could be funny?

I had to test it for myself.

Together with the statistical expertise of Theoretical Ecologist Dr. Sasha J. Wright, we devised a small set of experiments we could perform to test the Benign Violation Theory. With input from my friend and colleague at Undercurrent, Sociologist Dr. Dara Blumenthal, we went to work.

We administered a survey containing a series of randomly generated names set above an image of a Bassett Hound to native English speakers using Amazon Mechanical Turk. We asked survey participants a series of questions to determine if the proposed name was (1) violating, (2) ultimately benign, and (3) funny.

Our initial results are encouraging. For the experimental data we analyzed, we found a strong correlation between when a dog/name combination was both violating and benign with how humorous it was. This supports the Benign Violation Theory of Humor. It is our hope that our work will inspire a new set of experiments that probe the nature of what we find funny.

Experimental Design

We tested humor for three conditions: 1. violation (how upsetting it was) 2. if it was benign (how harmless it was), and 3. if people think it was funny. For the theory to be supported, there would be a positive correlation between the degree of violation and humor score but only if the candidate combination was also benign.

All that was needed was a large set of funny or unfunny items. A simple experimental mechanism was devised: a photo of a dog, and a series of computer-generated ‘names’. Some of the names would be funny when juxtaposed with the dog image, others would not be.

A Dog Named ‘Chloe’, ‘Titties’, and others. The same image of a basset hound was always used.

Dog names were generated with a custom application. Two word lists were used: a list of more than 1,300 English obscenities, and a list of more than 1,700 popular names for pets. From these word lists, alliterative combinations of obscenities and popular pet names were created (e.g. ‘Footfucker Falsetto’). A list of 1,000 names were output: 250 alliterative combinations, 250 obscenities, and 500 popular pet names.

Custom ‘Dog Names Generator’ application source and example output

Two experiments utilizing the dog photo/name mechanism were created: one to find evidence of objective humor (i.e. what makes something funny in general), and one to find evidence of subjective humor (i.e. what makes something funny to an individual) using the Benign Violation Theory.

Each experiment was administered using Amazon Mechanical Turk to native English speakers.

Experiment #1 — Objective Humor

Three survey instruments were taken by separate survey respondents. The experiment sought to determine if the Benign Violation Theory would hold objectively, on the level of a single dog/name combination. One name from the list of 1,000 was tested at a time, for a total of 3,000 independent survey responses.

The first survey asked respondents to take the viewpoint of the dog and give their opinion about whether the dog/name combination was violating. For the following survey examples, we’ll use the name ‘Tinkle’. The first survey asked:

Imagine you are this dog. If your name were ‘Tinkle’, would you be upset or pleased?

Respondents could select ‘Very upset’, ‘Somewhat upset,’ ‘Indifferent’, ‘Somewhat pleased’, or ‘Very pleased’. These responses were assigned a value — hidden from the respondent—ranging from 2 to -2.

A second individual was given a survey to measure how ultimately benign the name was. It asked:

Imagine you are this dog’s owner. Would you feel comfortable calling this dog by its name ‘Tinkle’ in public?

Respondents could select ‘Very uncomfortable’, ‘Somewhat uncomfortable,’ ‘Indifferent’, ‘Somewhat comfortable’, or ‘Very comfortable’. These responses were assigned a value from -2 to 2.

Lastly, a third individual was given a survey to measure how funny the dog/name combination was:

How humorous do you find the name ‘Tinkle’ is for this dog?

Respondents could select one of five answers ranging from ‘Very unfunny’ through ‘Very funny’. These were also assigned a value from -2 to 2.

The data were downloaded from Amazon Mechanical Turk as a CSV file.

Experiment #2 — Subjective Humor

Our first experiment sought to verify the Benign Violation Theory across subjects. We were also curious to know if the theory would demonstrate itself on the level of an individual. We created a new survey instrument that used the same dog/name combinations and questions as before but asked all three questions in a single survey.

The survey was administered on 1,000 distinct native English speakers. An example of the actual survey used appears below.

Subjective Humor Survey

As before, these data were downloaded from Amazon Mechanical Turk as a CSV file.

Analysis

Screenshot of initial experimental analysis spreadsheet.

Before the data could be analyzed, it had to be cleaned and merged together into a data set. The data were first imported into Google Sheets for cursory examination and manipulation. Text responses in the surveys were mapped to numeric values. Data were then exported for detailed statistical analysis and visualization into SAS JMP and an iPython Notebook.

All the experimental data were visualized using a scatter plot. One point was drawn per dog/name combination using its violation and benign score and colored by whether or not it was found funny. Since the violation and benign scores were all whole numbers, the graph had to be “jittered” to make the points appear separated from one another for visual patterns to be seen more easily.

In Experiment 1, it was apparent that strongly violating, non-benign dog/name combinations were distinctly unfunny (observe the cluster of blue points in the lower right hand corner). However, the distribution of funny dog/name combinations seemed less clear.

Results

To dig into these data, we performed a statistical analysis — a two-factor linear regression model for those who are numerically inclined. It showed a much more interesting pattern (for you nerds, the interaction between violation score and benign score was strongly significant, DF=1, 996, F=15.5, P<0.0001).

When we plotted these data as graphs comparing the benign score to funniness at different levels of violation, we found some interesting patterns:

At the highest level of violation (2, “I’d be very displeased to have this name”) a name becomes more funny as it becomes more benign. This is what the Benign Violation Theory hypothesizes. However, we should note that very few of the names in this study were objectively funny: most of the dog/name combinations lie well below the zero line for funniness. Also, an exception occurs at the lowest level of violation (-2, “I’d be very pleased to have this name”). Here, we may be capturing a phenomenon we have coined “delight” — as opposed to humor in the originally intended sense.

The same analysis for Experiment 2, our subjective assessment, shows a similar relationship (again for the nerds, also a strongly significant interaction in our analysis, DF=1, 993, F=24.4, P<0.0001):

Conclusions

Our data suggest we may not be far off from a world where we could enjoy Old Machines Texting Jokes.

Both Experiment 1 and Experiment 2 demonstrated the same pattern: as people become more displeased at the thought of a particular dog/name combination (i.e. violating) they also find this dog/name combination less funny. However, as these dog/name combinations became more comfortable to say in public (i.e. benign) this pattern flips and becomes funny again. This is just as the Benign Violation of Humor hypothesizes.

We’ve also observed a weaker correlation: as dog/name combinations became less violating and more benign they also tended to be judged as increasingly funny. We hypothesize that this might be an individual’s delight at a particular name. Could a dog named ‘Cuddles’ trigger a sigh that is close to a laugh? Teasing delight from the biting wit of humor may prove to be difficult in future inquiry.

If we separate ‘delight’ from ‘humor’ what might the other regions be called? Could we finally explain why we feel so ooky when somebody makes a comment that seems nice on the surface but does us injury? Such as when we are told, “you look so much nicer than usual today!”

Clearly, this single experiment does not allow us to conclude that the Benign Violation of Humor holds for all cases and all types of humor but it does open the door for further work and other, more mischievous experiments.

Star Trek: The Next Generation’s Lt. Commander Data. An Android who struggled with humor.

As computational technology increases by orders of magnitude, we wonder if this framework could be used as the basis for a humorous artificial intelligence. Could Netflix or Youtube more accurately predict what media we’ll find funny? Could Lt. Commander Data finally land a role on the 400th cast of SNL? As we watch the rise of such phenomena as Buzzfeed’s predictive models for virality and the AP’s use of robotic journalists, a digital comedian doesn’t seem as far away as we might otherwise think.

Sources

See Also