“Why did the chicken cross the road?” Whether they’re childish, dark or bawdy, jokes have been an integral part of human communication throughout history and across class and cultural boundaries.

If we ask one of today’s AI-powered voice assistants like Alexa and Siri to tell a joke, it might very well come up with something that puts a smile on our face. If however we then asked “Why do you think that joke is funny?” the bot would be stuck for a response. AI researchers want to change that.

Although AI technology has enabled machines to have interactive conversations to provide for example humanlike customer service interactions, not all the subtleties of speech-based communication have been replicated. One of the last frontiers is humour, a complex and creative social communicative behavior presented via a combination of word choice, body language, and sound cues.

Understanding humour seems natural for humans, but for artificial intelligence it remains a challenging task. To land a joke a person must appropriately align modalities such as words and gestures. Humour can be delivered in styles ranging from sarcastic to slapstick, reflecting both individual choices and underlying cultures. Context, sounds and vision or any combination of these can be key in building to a punch line, or in cleverly steering the joke to a less-expected punch line. Moreover, some humans won’t “get” a joke that others do. For these and other reasons, humour has remained relatively underexplored in the field of AI.

When AI researchers have attempted to decode humour and teach machines to understand it, their work has sometimes produced unexpected benefits. Because joke comprehension requires an AI to accurately interpret context, delivery, tone, emotion, etc., research on humour has offered insights for enriching NLP (Natural Language Processing) research, enabling human-machine conversations to be more natural.

UR-FUNNY: The trio of text, gestures and acoustics in one dataset

A team of University of Rochester and CMU researchers recently introduced the multimodal dataset UR-FUNNY, with data from 1,866 videos featuring 1,741 TED Talks speakers covering 417 diverse topics.

UR-FUNNY is designed to help interpret how factors such as words (text), gestures (vision) and prosodic cues (acoustics) are used in presenting humour. Researchers’ essential question to the machine was whether sequences of sentences extracted from the videos were funny. The machine learning models made their judgments by “detecting whether or not the last sentence constitutes a punchline.”

UR — FUNNY is the first multimodal language dataset for humour detection in the NLP community. The researchers behind UR — FUNNY also designed C — MFN (Contextual memory Fusion Network), an extension on state-of-the-art MFN (Memory Fusion Network) models to detect humour: “This is done by introducing two components to allow the involvement of context in the MFN model: 1)Unimodal Context Network, where information from each modality is encoded using M Long-short Term Memories (LSTM), 2) Multimodal ContextNetwork, where unimodal context information are fused (using self-attention) to extract the multi-modal context information.”

Human performance on the UR-FUNNY dataset was 82.5 percent, representing average performance of two annotators on a shuffled set of 100 humourous and 100 non-humourous prompts. Machine learning models were given the same input as the humans (similar context and punchline), and performance from the designed model was 65.23 percent.

The paper UR-FUNNY: A Multimodal Language Dataset for Understanding Humour is on ArXiv.

Surprising Pun Generation

“Yesterday I accidentally swallowed some food coloring. The doctor says I’m OK, but I feel like I’ve dyed (died) a little bit.” Applied scientist He He (yes that’s her real name) and other researchers from Stanford University and the University of Southern California referenced the above homophonic pun in their recent paper Pun Generation with Surprise.

Researchers used the example to show their novel unsupervised approach to pun generation: “Using a corpus of unhumorous text and what we call the local-global surprisal principle: we posit that in a pun sentence, there is a strong association between the pun word (e.g., ‘dyed’) and the distant context, as well as a strong association between the alternative word (e.g., ‘died’) and the immediate context. This contrast creates surprise and thus humor.” In other words, the most unexpected words can create the funniest twist.

The local-global surprisal principle functions in two stages. Researchers first developed a quantitative metric for surprise under a neural language model, based on the conditional probabilities of the pun word and which alternative word could be replaced under the local and global contexts. The next step was to use a retrieve-and-edit framework and an unsupervised approach to suggest and select puns from an unhumourous corpus to complete the edit.

As puns play a huge part in the paper, the system is appropriately called SURGEN (SURprisal — based pun GENeration)

Results from human evaluation indicated SURGEN’s retrieve-and-edit approach generates puns successfully 31 percent of the time. Researchers concluded the conceptual tool was not sufficiently robust, and believe the challenge lies in finding a balance between creativity and nonsense: “Language models conflate the two, so developing methods that are nuanced enough to recognize this difference is key to future progress.”

The paper Pun Generation with Surprise is on ArXiv.