ECE-285-ViT

Final project of ECE 285

TODO List

Build module :
- patch embedding (need to be debugged and test) : divide the image and do the patch and position embedding, include the Linear Projection and Flatten-related work.
- Transformer Encoder (need to be built from scratch): build 1 but reuse the block for 5~6 times
  - Norm
  - Multihead Attention
  - Norm
  - MLP
- Head
Pretrain: We can use dataset ImageNet (same as ViT paper)
- dataset website: [https://www.image-net.org/download.php]
Train and test on our dataset

All the source code are in model_src folder
All the notebooks we used are in the root folder
Here is our presentation video link (you need a UCSD email account to access it): https://drive.google.com/file/d/1dHjMl5QP530RDQmXeosVvtFcKUR_e9LG/view?usp=drive_link
I also uploaded our code to my google drive if this repo link thing does not work.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.ipynb_checkpoints		.ipynb_checkpoints
ReferencePapers		ReferencePapers
model_src		model_src
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
DatasetsInfo.md		DatasetsInfo.md
LICENSE		LICENSE
Project_Proposal___ECE_285.pdf		Project_Proposal___ECE_285.pdf
README.md		README.md
Visualization.ipynb		Visualization.ipynb
tinyViT-CNN.ipynb		tinyViT-CNN.ipynb
tinyViT-ResNet50.ipynb		tinyViT-ResNet50.ipynb
tinyViT.ipynb		tinyViT.ipynb