April 13, 2022
Zero-Shot Learning for Emotion Detection
Emotion detection is an active research area in artificial intelligence. AI researchers are using facial expressions, speech signals, and textual data as inputs to detect user emotions. Among these different emotion detection (ED) sources, textual data contains the least amount of information since it does not reflect facial expressions or audio streams, which are generally more direct representations of emotion. That being said, textual data is still rich in terms of the emotional content it may contain; therefore many approaches have been proposed to detect and identify emotions from text. In this post, we will review how NLP methods have evolved for this task. We will then introduce our novel approach to ED which relies on Natural Language Inference (NLI), a Natural Language Processing task.
Earlier approaches to ED from text mainly focused on lexicon-based methods (for a detailed review of earlier works see Canales et al., 2014 and Seyeditebari et al., 2018). Keyword-based approaches, which use a set of pre-determined keywords associated with an emotion, are one example of these lexical methods. Another well-known lexical method is an ontology-based approach, called EmotiNET, which initially generates a knowledge base and identifies emotions in a given text by comparing its language with that of the knowledge base. These lexical methods heavily rely on a set of pre-determined rules and keywords, and hence, they are not efficient and do not generate good results in different settings.
Later, supervised machine learning approaches were introduced, and these became very popular due to (1) the increase of emotion datasets and (2) significant performance improvements compared to the lexical approaches. Earlier ED datasets were mainly gathered from Twitter, where hashtags and emoticons were considered to be the labels for supervised learning methods. Additionally, there has been a significant effort to generate reliable and consistent datasets in the 2010s from product and movie reviews, as well as curated tweets.
Emotion detection is oftentimes used interchangeably with sentiment analysis. Although these two terms share some similarities, they are fundamentally different. Sentiment analysis only focuses on identifying whether a given text has a positive or negative sentiment (some sentiment models include a neutral category as well). However, emotions are far more diverse than just positive or negative feelings. For instance, suppose we have two customer reviews like “I don’t like this product” and “I hate this product”. To a sentiment model, these two reviews should be considered the same, i.e. negative. However, the underlying emotions for these reviews differ, as the second review represents anger, whereas the first review only reflects dislike.
Sentiment models not only lack the underlying emotion, but they may also lack the concept of “magnitude” of an emotion. This is also true for current ED approaches. When we think about two reviews that have same emotion but in different levels, it is difficult to distinguish their magnitudes. Again, consider these two following reviews:
Review 1: “I am quite annoyed that this product doesn’t work”
Review 2: ”I am quite annoyed that this product doesn’t work. This company is total scam and they should go to hell!!!!!”
These two reviews have the same emotion - anger, however, the anger level in the two reviews are quite different as the second review reflects a higher degree.
Challenges with Customer Service Messages
At Y Meadows, we process tens of thousands of messages/emails to identify customer intents, and named entities within these emails. We are further interested in understanding the emotional state of a customer so that the customer support teams can prioritize their actions.
To identify the customer emotions, we first started with different sentiment models that are trained with BERT, RoBERTA, and ALBERT. The training data for these sentiment models consists of product reviews and IMDB movie reviews. Although these models are state-of-the-art sentiment models, they did not help with our use case. As one can imagine, understanding the emotion of a customer does not fit well into the sentiment analysis framework, as people who reach out to customer service are almost always reaching out to report a problem, which ends up being classified as negative by sentiment models. In our domain, we are often interested in identifying the anger level of a customer. There are instances in which a customer is reporting only a minor non-urgent issue to customer service, and there are other instances in which customers report their frustrations in an angrier tone, which may need to be resolved more urgently. It also may be the case that an angrier tone in the message could reflect the fact that the customer has been waiting for a long-standing issue to be resolved. Our goal for developing a form of emotion detection was to identify customer anger from their messages, and to even attempt to detect the anger level of customer.
There are several transformer-based Emotion Detection models built using emotion datasets. One such example is listed in Hugging Face’s repository: Emotion Model. This model uses Google’s T5 transformer model and fine-tunes it with emotion recognition dataset . However, since the data that these models were trained on was different than customer service messages, they did not generate desired results.
Zero-Shot Learning: Natural Language Inference
Our solution was inspired by a recent paper by Schick et al. 2021. The paper introduces the Pattern-Exploiting Training (PET) strategy, which forms the foundations of our approach. We combine the PET strategy with the idea of Natural Language Inference for emotion detection. Natural Language Inference is an NLP task which uses a pair of sentences, a “premise” and a “hypothesis”, and the task is to determine whether for a given statement (premise) the hypothesis is true (entailment), false (contradiction), or undetermined (neutral).
As seen from the table above (taken from NLP Progress ) the language in hypothesis is compared with that of premise and then a label is generated based on the likelihood of hypothesis to be followed after the premise. There are several transformer models that were fine-tuned on NLI datasets. These datasets contain premise-hypothesis pairs that were used to train the NLI models. For a detailed view of models and datasets see NLI Models/Datasets.
Our approach combines this NLI task with the PET strategy by considering the customer message to be the premise, and then creating a set of hypotheses sentences that correspond to the emotion of interest. In our use case, we only focused on detecting customer anger, however, this approach can be generalized to other emotions quite easily.
Our anger detection model uses an ensemble of NLI models including Facebook’s BART and Microsoft’s DeBERTa. In addition, it uses multiple hypotheses to increase accuracy, as opposed to a single hypothesis/label as is typically done in zero-shot classification. Besides for the models we use, the approach we propose is different than classical emotion detection models for the following reasons:
- Our emotion detection approach does not require any training procedure or training data. Therefore, no labeling is needed. This is the most significant advantage of our approach.
- Although our primary interest is in detecting anger from messages, the same approach can be applied to detect other emotions should the need arise.
- Some emotions, such as anger, are oftentimes expressed with non-contextual features, such as all upper case letters, or excessive use of punctuation like exclamation points. The NLI models used in this approach automatically take these type of features into account and this adjusts the anger level accordingly.