Multi-Modal Emotion Classification

Index

Prefase
Project Overview
Project Structure
Datasets
Dependencies
Development environment
Setting up the Environment
How to Run the Project
Test by Yourself
Results
Contributors
References

Prefase

This is an experimental project for the exam of Cognitive Robotics of the University of Naples Parthenope, whose goal is to create a system in which a humanoid reacts to emotions perceived in relation to a human. The perception of emotions occurs thanks to an EEG helmet and audio and video sensors installed in the Pepper robot. This repository refers to the first part of this project which consists in the creation of a model for emotion recognition. The second part, which is not on github, consists in defining behaviors on the robot pepper based on emotions detected by the model. This project is translated into a paper available at this link.

Project Overview

This project focuses on developing a multi-modal emotion classification system combining audio, video and EEG inputs. Two deep learning models and a meta model are integrated to achieve this:

Audio-Video Emotion Classification Model Based on the paper "Learning Audio-Visual Emotional Representations with Hybrid Fusion Strategies", this model classifies four emotions using audio and video inputs.
FBCCNN (Feature-Based Convolutional Neural Network) Based on the paper "Emotion Recognition Based on EEG Using Generative Adversarial Nets and Convolutional Neural Network", this model uses EEG data to enhance emotion classification.
Meta-model This model receive as input the predictions of the two deep learning models and through a logistic regression function obtain the final prediction that are: neutral, happy, angry, sad.

Project Structure

project-root
│
├───audio_video_emotion_recognition_model
│   │   
│   ├───datasets       
│   ├───Data_preprocessing        
│   ├───Image      
│   ├───Multimodal_transformer 
│   │   ├───Preprocessing_CNN  
│   │   │   ├───Preprocessing_utils
│   │   │   
│   │   ├───Transformers
│   │           
│   ├───results      
│   ├───utils
│                 
├───EEG_model  
│   ├───datasets   
│   ├───Images     
│   ├───results       
│   ├───utils 
│     
├───envs    
├───Meta_model        
│   ├───results      
└───Shared

Datasets

The dataset used for training of audio-video emotion recognition model is RAVDESS, that can be downloaded here

The dataset used for training the eeg-model is SEED-IV, that can be requested here

Dependencies

The main dependencies are:

Python 3.9
PyTorch 2.6
Torcheeg 1.1.3

All dependencies are specified in the .yml files located in the envs directory.

Development environment

The development was performed in a Linux CentOS environment on the machine made available by the University of Naples Parthenope. The machine is equipped with 8 computational nodes each equipped with 32 cores and 192 Giga Bytes of RAM for a total of 296 CPU cores. 4 of these 8 computational nodes are each equipped with 4 GPUs for a total of 16 NVIDIA V100 NVLINK devices. Each of these GPUs is equipped with 5120 CUDA cores and 32GB of RAM for a total of 81920 GPU cores. The computational nodes are connected to each other through a high-performance network.

Setting up the Environment

At the moment the .yml file for the windows environment is not complete, because some libraries related to the audio-video and eeg models are missing, these libraries can be easily downloaded through conda. The linux environment is complete instead.

To replicate the development environment, you can use Conda. The .yml files required for creating the environment are located in the envs directory.

To create the environment in a windows system, run:

conda env create -f envs/environment_windows.yml
conda activate cognitive_robotics_env

To create the environment in a linux system, run:

conda env create -f envs/environment_linux.yml
conda activate cognitive_robotics_env

How to Run the Project

Models must be individually trained before the meta model can be trained.

Audio-video remotion recognition model:
```
cd audio_video_emotion_recognition_model
```
Before use the model it's mandatory to perform the preprocessing steps:

Inside each of three scripts, specify the path (full path!) where you have downloaded the data. Then run:
```
cd ravdess_preprocessing
python extract_faces.py
python extract_audios.py
python create_annotations.py
```
As a result you will have annotations.txt file that you can use further for training.
- Training - Validation - Testing:
```
python main.py
```
If you want to perform just one of those steps add the arguments --no-train or --no-val or --test. For more details see opts file
- Prediction: (For those who want to try the single model)
```
python main.py --no-train --no-val --test --predict
```
EEG-model:
- Training - Validation - Testing:
```
python main.py --path_eeg [Path of dataset SEED IV]
```
If you have the folder of cached preprocessed dataset seed IV, you can specify it with argument --path_cached

If you want to perform just one of those steps add the arguments --no-train or --no-val or --test. For more details see opts file
Meta model:
- Training - Testing:
```
python main.py --path_eeg [Path of dataset SEED IV]
```
If you have the folder of cached preprocessed dataset seed IV, you can specify it with argument --path_cached

If you want to do only the prediction add the argument --predict

If you want to perform just one of those steps add the arguments --no-train or --test. For more details see opts file

Test by yourself

If you want to test by yourself you can find the pretrained weights of the models in the results directories of the respective models.

Results

The following metrics are plotted:

For training: accuracy and loss.
For validation: accuracy and loss.
For testing: accuracy, loss, and confusion matrix.

Detailed plots for the audio-video model can be found in the audio_video_emotion_recognition_model/Image directory, while plots for the EEG model are available in the EEG_model/Images directory.

For the meta-model you can visualize in the Meta_model/Images the confusion matrix computed using the test set of audio-video and eeg.

Contributors

Esposito Renato (me)
Mele Vincenzo
Verrilli Stefano

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
EEG_model		EEG_model
Meta_model		Meta_model
Shared		Shared
audio_video_emotion_recognition_model		audio_video_emotion_recognition_model
envs		envs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture_overview.jpg		architecture_overview.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Modal Emotion Classification

Index

Prefase

Project Overview

Project Structure

Datasets

Dependencies

Development environment

Setting up the Environment

How to Run the Project

Test by yourself

Results

Contributors

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

RenatoEsposito1999/Multi-Modal-Emotion-Classification

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Emotion Classification

Index

Prefase

Project Overview

Project Structure

Datasets

Dependencies

Development environment

Setting up the Environment

How to Run the Project

Test by yourself

Results

Contributors

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages