w210 graduate project focusing on understanding the mechanisms of BERT encoders when learning humor
Our code is available via github and along with our collab notebooks that we used to train our BERT models.
Given the exponential rate of growth of user created data, it is important to consider ways to automatically detect specific information. Our team is interested in exploring and understanding how specifically humorous content can be detected and created using deep learning based means. Furthermore, our team has shifted focus on understanding the mechanisms of training encoders with humorous text. The purpose of this project is to build model architecture for detecting humor and examining how encoders such as ALBERT learn humor.
Determine how well Natural Language Processing (NLP) can understand humor by:
Humor is very contextual and often ambiguous in construction making it harder for a model to understand and predict. This makes it interesting for exploring the boundaries of NLP
Other papers have attempted to understand how well NLP can detect humor by predicting the humor of a joke, but we attempt to take it a step further by generating jokes and analyzing how humor is detected.
In our project, we focus on two deep learning concepts called BERT and GPT2.
Overall, there is no universally agreed upon theory of humor.
Two leading theories: Benign Violation Theory: a situation threatens the way that you believe the world “ought” to be but is benign.
Incongruity Theory: humour arises when things that do not normally go together replace logic and familiarity. These theories seem to match how jokes are told (especially puns), but are likely too vague to build a quantitative description of humor
Highest rated Reddit jokes - similar overall pattern but still very different: ** Breaking news: Bill Gates has agreed to pay for Trump’s wall …On the condition he gets to install windows
Our main metric of choice is the F1 score in effort to mitigate a balance between Type I and II errors. Thie results reveal that the model is able to classify text patterns of what is considered to be humorous jokes and puns vs. jokes and puns not considered to be funny.
The impressive F1 score gives us confidence that BERT is able to deconstruct humor and classify it well according to some social threshold (in our case we consided jokes with over 200 upvotes to be “humorour”). Where BERT struggles it appears is generating unique humor text.
In short, our team were able to create a highly performant joke classifier with BERT and were able to identify some of the structural patterns of specialied jokes and puns through the lens of BERT encodings. This experiment has validated the difficulty in quantifying humor but we also celebrate our success in able to built upon previous work and identify patterns and trends for certain types of jokes.
The capstone slides are available at the following link.
All code, models, and datasets are available in the github page