Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

This is not an official implementation, but a modified version of Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training to apply to ECG domain, instead of chest X-ray.

Before training the model, plese follow these instructions to install fairseq-signals and prepare required datasets.

Pre-training a new model

$ fairseq-hydra-train \
    task.data=/path/to/manifest \
    model.pretrained_model_path=/path/to/checkpoint.pt \
    --config-dir examples/m3ae/config/pretraining \
    --config-name w2v-cmsc_bert

Note that this model requires a pre-trained ECG encoder, provided by model.pretrained_model_path. To pre-train ECG encoder, follow other instructions such as Wav2Vec 2.0, W2V+CMSC+RLM, or any other SSL implementations.

Fine-tuning a pre-trained model

Fine-tune on the ECG Question Answering task

We assume the task is formulated as a multi-label classification, and model.num_labels is based on the ECG-QA dataset.

$ fairseq-hydra-train \
    task.data=/path/to/manifest \
    model.model_path=/path/to/checkpoint.pt \
    model.num_labels=103 \
    --config-dir examples/m3ae/config/finetuning/ecg_question_answering \
    --config_name base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Pre-training a new model

Fine-tuning a pre-trained model

Fine-tune on the ECG Question Answering task

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Pre-training a new model

Fine-tuning a pre-trained model

Fine-tune on the ECG Question Answering task