DS598 DL4DS Midterm Project

For my project I used the Microsoft Git Large model trained on the coco image dataset [4][5]. I found that this one was relatively simple to implement and work with. Fine tuning the model took the most time, I had to experiment with the attention mask, learning rate, and batch sizes to finally get a model that performs well. I ended up finding a nice parameter set that got me a CIDEr score of ~75 after only 1 epoch. I had fun learning about hugging face and implementation of deep learning models!

The model is completely contained within demo/train.py and demo/test.py but most of my experiments and work were done within experiments.ipynb

References

CIDEr: Consensus-based image description evaluation
BLEU: A Misunderstood Metric from Another Age, Medium Post
BLEU Metric, HuggingFace space
Microsoft Git Large
GIT: A Generative Image-to-text Transformer for Vision and Language, Jianfeng Wang and Zhengyuan Yang and Xiaowei Hu and Linjie Li and Kevin Lin and Zhe Gan and Zicheng Liu and Ce Liu and Lijuan Wang (2022)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!