Multi-Lingual Approach for Multimodal Emotion and Sentiment Recognition Based on Triple Fusion

Abstract

Affective states recognition is a challenging task that requires a large amount of input data, such as audio, video, and text. This paper addresses the problem of affective states recognition involving multi-task emotion and sentiment recognition. We consider several unimodal models based on temporal encoders: Transformer-based, Mamba, xLSTM. We propose various multimodal fusion strategies that include double and triple fusion strategies with and without a label encoder. Double fusion strategies involve interaction between two main modalities, while triple fusion strategies handle audio, video, and text modalities equally. Strategies with the label encoder combine emotional and sentiment predictions with deep features. Using three publicly available corpora, RAMAS, MELD, and CMU-MOSEI, we conduct an extensive experimental study using unimodal (audio, video, or text) and multimodal models to comprehensively understand their capabilities and limitations. On the Test subset of the CMU-MOSEI corpus, the proposed approach showed mean macro F1-score (mWF) of 88.6%, and macro F1-score (WF) of 84.8% for emotion and sentiment recognition, respectively. On the Test subset of the MELD corpus, the proposed approach showed WF of 49.6% and WF of 60.0%, respectively. On the Test subset of the RAMAS corpus, the proposed approach showed WF of 71.8% and WF of 90.0%, respectively. We compare the performance of the approach proposed with that of the SOTA ones.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt
run-notebook.sh		run-notebook.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Lingual Approach for Multimodal Emotion and Sentiment Recognition Based on Triple Fusion

Abstract

About

Releases

Packages

Languages

markitantov/EmoSen

Folders and files

Latest commit

History

Repository files navigation

Multi-Lingual Approach for Multimodal Emotion and Sentiment Recognition Based on Triple Fusion

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages