Skip to content

Implementation Code for `Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning` (PDMER)

License

Notifications You must be signed in to change notification settings

Littleor/Personalized-DMER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning (DSAML)

github-profile-readme-generator license github-profile-readme-generator forks github-profile-readme-generator stars github-profile-readme-generator issues github-profile-readme-generator pull-requests

[Project Website] | [Paper]

Model Architecture

Here is the core implementation of the DSAML model in the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning", which is accepted by the AAAI 25.

Get Start

Prerequisites

  • Python >= 3.8.5, < 3.9
  • PyTorch >= 2.2.1

Installation

conda env create -f environment.yml
pip install -r requirements.txt

Dataset Download

We need to download the DEAM dataset and unzip both the audio and annotation files. Specifically, you need create DEAM_Annotations and DEAM_audio folders in the root directory of the dataset root folder, and put the annotation and audio files in the corresponding folders. The final file structure should be like this:

DEAM
├── DEAM_Annotations
│   ├── annotations
├── DEAM_audio
│   ├── MEMD_audio
└── features (This is Optional)
    └── features

Then we need to preprocess the dataset, but before we do that, we need to create the .env file.

Environment Variables

After downloading the dataset, you need to create a .env file in the root directory of the project. The .env file should contain the following environment variables:

# The directory to save the logs
LOG_DIR="./logs"    

# The directory to save the audio embedding for DEAM dataset
AUDIO_EMBEDDING_DIR_NAME="feature_embedding"    
# The path to the DEAM dataset
DATASET_PATH="/your/path/to/DEAM"    

# The key to the audio input in the dataset, please keep this
AUDIO_INPUT_KEY="log_mel_spectrogram"

You should modify the DATASET_PATH and PMEMO_DATASET_PATH to the path where you store the DEAM and PMEmo dataset.

Dataset Preprocessing

In order to speed up the training process, we need to preprocess the dataset. You can run the following command to preprocess the dataset:

./scripts/dataset.sh
# If you want to use specific GPU, you can add the following command
# CUDA_VISIBLE_DEVICES=1 ./scripts/dataset.sh

This process will take about one hour, depending on your machine.

Train

After the dataset is preprocessed, you can train the model by running the following command:

# For DMER Task
python train.py --device "cuda:0" --not_using_maml
# For PDMER Task
python train.py --device "cuda:0" --using_personalized_data_train --using_personalized_data_validate

Inference

For inference, you can use the following code snippet after training the model:

from utils.inference import build_batch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model and checkpoint before inference
# model = PDMERModel(device=device).to(device)
# model.load_state_dict(torch.load("path/to/checkpoint.pth"))

audio_file_path_list = [
    "/path/to/audio1.wav",
    "/path/to/audio2.wav",
]

# Build the input batch.
embedding, _ = build_batch(
    audio_file_path_list,
    imagebind_model=None,   # If there are no ImageBind instances, set it to None, and it will auto load the model
    device=device,
)

print("\n Build batch embedding:")
for key, value in embedding.items():
    print("\t", key, value.shape)

print("Result:")
output = model(embedding)
print("Arousal: ", output["model_output"][0].shape) # The first element is the arousal prediction, [batch_size, 2 * second]
print("Valence: ", output["model_output"][1].shape) # The second element is the valence prediction, [batch_size, 2 * second]

Citation

If you find this code useful in your research, please consider citing:

@misc{zhang2024personalizeddynamicmusicemotion,
      title={Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning}, 
      author={Dengming Zhang and Weitao You and Ziheng Liu and Lingyun Sun and Pei Chen},
      year={2024},
      eprint={2412.19200},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2412.19200}, 
}

About

Implementation Code for `Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning` (PDMER)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published