humorousbert.io

w210 graduate project focusing on understanding the mechanisms of BERT encoders when learning humor

View the Project on GitHub jlee-snn/humorousbert.io

MIDS UC Berkeley w210: Humorous Bot Study

Our code is available via github and along with our collab notebooks that we used to train our BERT models.

Introduction

Given the exponential rate of growth of user created data, it is important to consider ways to automatically detect specific information. Our team is interested in exploring and understanding how specifically humorous content can be detected and created using deep learning based means. Furthermore, our team has shifted focus on understanding the mechanisms of training encoders with humorous text. The purpose of this project is to build model architecture for detecting humor and examining how encoders such as ALBERT learn humor.

Objective

Determine how well Natural Language Processing (NLP) can understand humor by:

  1. generating jokes
  2. classifying jokes and
  3. by analyzing how the models understand humor

Why Humor?

Humor is very contextual and often ambiguous in construction making it harder for a model to understand and predict. This makes it interesting for exploring the boundaries of NLP

What’s new?

Other papers have attempted to understand how well NLP can detect humor by predicting the humor of a joke, but we attempt to take it a step further by generating jokes and analyzing how humor is detected.

Pipeline Overview

BERT and GPT 2 Recap

In our project, we focus on two deep learning concepts called BERT and GPT2.

What Can Bert Tell Us About Humor?

Overall, there is no universally agreed upon theory of humor.

Two leading theories: Benign Violation Theory: a situation threatens the way that you believe the world “ought” to be but is benign.

Incongruity Theory: humour arises when things that do not normally go together replace logic and familiarity. These theories seem to match how jokes are told (especially puns), but are likely too vague to build a quantitative description of humor

Other Findings

GPT 2 Joke Generation Results

Example of puns

Example of jokes

Joke Classification Model Results

Our main metric of choice is the F1 score in effort to mitigate a balance between Type I and II errors. Thie results reveal that the model is able to classify text patterns of what is considered to be humorous jokes and puns vs. jokes and puns not considered to be funny.

The impressive F1 score gives us confidence that BERT is able to deconstruct humor and classify it well according to some social threshold (in our case we consided jokes with over 200 upvotes to be “humorour”). Where BERT struggles it appears is generating unique humor text.

CLoser insepction of results using BertViz

Conclusion: NLP can detect the patterns of jokes and puns, but doesn’t actually “understand” humor* Achieved much better than random humor classification results using BERT

In short, our team were able to create a highly performant joke classifier with BERT and were able to identify some of the structural patterns of specialied jokes and puns through the lens of BERT encodings. This experiment has validated the difficulty in quantifying humor but we also celebrate our success in able to built upon previous work and identify patterns and trends for certain types of jokes.

Presentation Slides

The capstone slides are available at the following link.

Code and Datasets

All code, models, and datasets are available in the github page