Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning (DSAML)

Here is the core implementation of the DSAML model in the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning", which is accepted by the AAAI 25.

Get Start

Prerequisites

Python >= 3.8.5, < 3.9
PyTorch >= 2.2.1

Installation

conda env create -f environment.yml
pip install -r requirements.txt

Dataset Download

We need to download the DEAM dataset and unzip both the audio and annotation files. Specifically, you need create DEAM_Annotations and DEAM_audio folders in the root directory of the dataset root folder, and put the annotation and audio files in the corresponding folders. The final file structure should be like this:

DEAM
├── DEAM_Annotations
│   ├── annotations
├── DEAM_audio
│   ├── MEMD_audio
└── features (This is Optional)
    └── features

Then we need to preprocess the dataset, but before we do that, we need to create the .env file.

Environment Variables

After downloading the dataset, you need to create a .env file in the root directory of the project. The .env file should contain the following environment variables:

# The directory to save the logs
LOG_DIR="./logs"    

# The directory to save the audio embedding for DEAM dataset
AUDIO_EMBEDDING_DIR_NAME="feature_embedding"    
# The path to the DEAM dataset
DATASET_PATH="/your/path/to/DEAM"    

# The key to the audio input in the dataset, please keep this
AUDIO_INPUT_KEY="log_mel_spectrogram"

You should modify the DATASET_PATH and PMEMO_DATASET_PATH to the path where you store the DEAM and PMEmo dataset.

Dataset Preprocessing

In order to speed up the training process, we need to preprocess the dataset. You can run the following command to preprocess the dataset:

./scripts/dataset.sh
# If you want to use specific GPU, you can add the following command
# CUDA_VISIBLE_DEVICES=1 ./scripts/dataset.sh

This process will take about one hour, depending on your machine.

Train

After the dataset is preprocessed, you can train the model by running the following command:

# For DMER Task
python train.py --device "cuda:0" --not_using_maml
# For PDMER Task
python train.py --device "cuda:0" --using_personalized_data_train --using_personalized_data_validate

Inference

For inference, you can use the following code snippet after training the model:

from utils.inference import build_batch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model and checkpoint before inference
# model = PDMERModel(device=device).to(device)
# model.load_state_dict(torch.load("path/to/checkpoint.pth"))

audio_file_path_list = [
    "/path/to/audio1.wav",
    "/path/to/audio2.wav",
]

# Build the input batch.
embedding, _ = build_batch(
    audio_file_path_list,
    imagebind_model=None,   # If there are no ImageBind instances, set it to None, and it will auto load the model
    device=device,
)

print("\n Build batch embedding:")
for key, value in embedding.items():
    print("\t", key, value.shape)

print("Result:")
output = model(embedding)
print("Arousal: ", output["model_output"][0].shape) # The first element is the arousal prediction, [batch_size, 2 * second]
print("Valence: ", output["model_output"][1].shape) # The second element is the valence prediction, [batch_size, 2 * second]

Citation

If you find this code useful in your research, please consider citing:

@misc{zhang2024personalizeddynamicmusicemotion,
      title={Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning}, 
      author={Dengming Zhang and Weitao You and Ziheng Liu and Lingyun Sun and Pei Chen},
      year={2024},
      eprint={2412.19200},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2412.19200}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
script		script
static/images		static/images
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning (DSAML)

Get Start

Prerequisites

Installation

Dataset Download

Environment Variables

Dataset Preprocessing

Train

Inference

Citation

About

Releases

Packages

Languages

License

Littleor/Personalized-DMER

Folders and files

Latest commit

History

Repository files navigation

Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning (DSAML)

Get Start

Prerequisites

Installation

Dataset Download

Environment Variables

Dataset Preprocessing

Train

Inference

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages