Skip to content

Latest commit

 

History

History
84 lines (67 loc) · 3.68 KB

README.md

File metadata and controls

84 lines (67 loc) · 3.68 KB

MMAD: Multi-modal Movie Audio Description

If you like our project, please give us a star ⭐ on GitHub for latest update.

[📜Paper] [🗂️Project Page]

📰 News

[2024.2.20] Our MMAD has been accepted at COLING 2024! Welcome to watch 👀 this repository for the latest updates.

😮 Highlights

MMAD exhibits remarkable AD generation capabilities in movies, by utilizing multiple modal inputs.

🎥 Demo

The talented pianist, 1900, mesmerized the audience with his virtuosic performance of "Christmas Eve" while wearing a pristine white tuxedo and bow tie.Chris Gardner, a man with a box in his hand, runs frantically through the city, dodging people and cars while being chased by a taxi driver who is honking.
Dancing in the rain, Don Lockwood twirls with joy, umbrella in hand, amidst city streets.Alice fled through the mushroom forest, her heart racing as the Bandersnatch's ominous hisses and growls echoed behind her.

🛠️ Installation

  1. You are required to install the dependencies. If you have conda installed, you can run the following:
git clone https://github.com/Daria8976/MMAD.git
cd MMAD
bash environment.sh
  1. Download weights from pretrained model:
  • checkpoint_step_50000.pth under checkpoint folder
  • base.pth under AudioEnhancing/configs folder
  • LanguageBind/Video-LLaVA-7B under VideoCaption folder
  1. prepare REPLICATE_API_TOKEN in llama.py

  2. Prepare demo data (We provide four demo video here):

  • put demo.mp4 under Example/Video
  • put [character photo] (Photos should be named with the corresponding character name) under Example/ActorCandidate

💡 Inference

python infer.py

🚀 Main Results

Finally, we organized 10 vision health volunteers, 10 BVI people (including 3 totally blind and 7 partially sighted) for human evaluation via Likert scale, and we merged the statistical results into the result table of the paper.

📜 Cite

@inproceedings{ye2024mmad,
  title={MMAD: Multi-modal Movie Audio Description},
  author={Ye, Xiaojun and Chen, Junhao and Li, Xiang and Xin, Haidong and Li, Chao and Zhou, Sheng and Bu, Jiajun},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  pages={11415--11428},
  year={2024}
}

Acknowledgements

Here are some great resources we benefit or utilize from:

⭐️ Star History

Star History Chart