Pinned Loading
-
vision_transformers_from_scratch
vision_transformers_from_scratch PublicForked from sneha31415/vision_transformers_from_scratch
This project aims to develop an image captioning model by leveraging the power of Vision Transformers (ViTs) as described in the 2020 paper "An Image is worth 16 x 16 words".
Jupyter Notebook
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.