Skip to content

OweysMomenzada/Improving-Emotion-Detection-with-context-sensitive-classification-for-German-text-corpus

Repository files navigation

Author: Oweys Momenzada

Improving Emotion Detection with context sensitive multi-classification for German text corpus based on XLM-roBERTa.

This work is currently ongoing in form of a masterthesis and research paper inorder to improve the system.

FOR DEEPER INSIGHT INTO THE WORK AND APPROACH, ALL NOTEBOOKS ARE WELL DOCUMENTED AND PROVIDED ON THIS GITHUB REPOSITORY.

What is this repository about?

In my time at SCHICKLER I was allowed to work on the award-winning DRIVE-Project. The Drive project has data from various regional publishers throughout Germany. My task was to improve an existing Emotion Classifier used for SCHICKLERs dataset on Google BigQuery. In addition, I have also implemented Sentiment Analysis for their dataset which has not been implemented before.

What is Emotion Detection/Sentiment Analysis and why is a solution relevant?

Emotion Detection simply detects emotions in a text. For instance, we can detect joy, fear, anger and sadness in our text with it. With Sentiment Analysis the text can be labelled (most commonly) as negative, positive or neutral. Which sounds so easy, is actually quite challenging. There is no existing data for German Emotion Detection and also Sentiment Analysis is poorly documented. With my approach we can actually build a decent Emotion Classifier for German text and can also outperform existing Sentiment Analysis approaches on some cases (such as the Oliverguhr Sentiment Analysis model, which is based on BERT.).

What is my solution?

Before, SCHICKLER used triggerwords to detect emotions from a text corpus. The problem here is that context cannot be taken into account. For instance, "das freut mich gar nicht" (translates to: "I am not pleased at all") would be positive, since the triggerword would be "freuen" (translates to: pleased). My solution is to train a model on labelled dialogues and example sentences based on the triggerwords (see section Data). A Sentiment is going to be generated based on negative emotions (fear, anger, sadness) and positive emotions (joy). We also will use a "neutral" emotion to neutralize overloaded emotions. This will be explained in detail in the following sections.

Data

ALL DATA AND FUNCTIONS RELATED TO DATA COLLECTION CAN BE SEEN IN "./Data collecting"

As mentioned, we do not have any data to implement an Emotion Detection model. Therefore, we need to build a dataset on our own. In the first step we build our dataset based on the triggerwords (for triggerwords, see citation or "./Data collecting/triggerwords.xlsx"). We than send our triggerwords to an API (DWDS-API) and then generate sentences based on these words. This could look as followed:

image1

We also filter negations to avoid false labelling. In this way we could generate more than 6000 sentences based on 680 triggerwords. However, since this dataset has no negations, we also use some English dialogues from various datasets (see citation) to solve this issue. We use the Google NLP API (see "./Data collecting/Emotiondataset_builder.py") to translate the English dataset to German sentences. Finally, we have a dataset with over 11 000 sentences for five emotions: anger, sadness, joy, fear and neutral (see "./Data collecting/fullset.csv").

Model

MODEL AND TRAINING CAN BE SEEN IN "./Model training/Model training.ipynb"

Since SCHICKLER used triggerwords for detection, the running time was comparatively really fast. This is important, since SCHICKLER is getting a lot of data into their pipelines and therefore a short running time is costly more efficient. Therefore, we need a model which is good in performance and accuracy. In the experiments, we can notice that simple LSTMs are way more efficient in running time (compared to BERT, BiLSTM, CNN+LSTM) and also have a decent accuracy. Because of that, we will use the LSTM model for training purposes.

A prediction on our model could look as follows (ror more examples and results see "./Results and Examples.ipynb"):

# translated to: Today's weather forecast: there will be a tornado today
predict('Wetterbericht von heute: heute wird es einen Tornado geben')

>>>{'anger': 0.013,
>>>  'fear': 0.9147,
>>>  'joy': 0.0105,
>>>  'neutral': 0.0039,
>>>  'sadness': 0.058}

Sentiment Analysis

The Sentiments are defined as [negative, likely negative, neutral, likely positive, positive] based on the emotions. Negative emotion will output a negative sentiment score and positive emotions will output a positive sentiment score. For the threshold of each Sentiment see "./Application - API/main.py".

Finally our results look like this:

example = create_emotions_sentiment("Heute spielen FC Bayern gegen den FC Barcelona.")

print(example)

>>>{'emotions': 
>>>  {'anger': 0.0665030256, 
>>>  'fear': 0.1034225, 
>>>  'joy': 0.545249, 
>>>  'neutral': 0.0235871468, 
>>>  'sadness': 0.261238247},
>>>sentiments': 
>>>  {'sentiment_label': 'neutral', 
>>>  'sentiment_valence': 0.0905028532}}
The Sentiment Analysis approach of this work could experimentally outperform existing state-of-art opensource projects for German Sentiment Analysis, such as the Oliverguhr-project.

Real world Application, API & Deployment

A Real World Application on some Headliners of articles can be seen here: "Results and Examples.ipynb"

We provide this for the SCHICKLERs database based on an API. We first store the trained model into a Bucket in Google Cloud Storage and than load it into GCP AI Platform. We then implement Textcleaning and other Feature Engineering steps and also the communcation with the trained model on AI platform on a different .py-file (see ""./Application - API/main.py""). In addition, we use FLASK for our RESTful API. We implement a POST request to send requests to the API. We then finally deploy our API on APP Engine to provide for EDA purposes and our dataset.

 

Workflow

Citation

Used Datasets

  • dailydialog: 2017, 102k
  • emotion-stimulus: 2015
  • isear: 1990
Used triggerwords
@book{aschenbrenner2019emotionserkennung,
  title={Emotionserkennung bei Nachrichtenkommentaren mittels Convolutional Neural Networks und Label Propagationsverfahren},
  author={Aschenbrenner, A. and Spies, M.},
  url={https://core.ac.uk/download/pdf/275811762.pdf},
  year={2019},
  publisher={Universit{\"a}tsbibliothek der Ludwig-Maximilians-Universit{\"a}t},
  pages={339-352}
}

Please cite this GitHub if you use this work.

@misc{momenzada_schickler_2021_emotion, 
      title={Improving Emotion Detection with context sensitive classification for German text corpus}, 
      author={Momenzada, Oweys and SCHICKLER}, 
      url={https://github.com/OweysMomenzada/Improving-Emotion-Detection-with-context-sensitive-classification-for-German-text-corpus}, 
      journal={Github}, 
      year={2021}, 
      month={Sep}
      } 

About

Author: Oweys Momenzada

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published