Final project of ECE 285
- Build module :
- patch embedding (need to be debugged and test) : divide the image and do the patch and position embedding, include the Linear Projection and Flatten-related work.
- Transformer Encoder (need to be built from scratch): build 1 but reuse the block for 5~6 times
- Norm
- Multihead Attention
- Norm
- MLP
- Head
- Pretrain: We can use dataset ImageNet (same as ViT paper)
- dataset website: [https://www.image-net.org/download.php]
- Train and test on our dataset
- All the source code are in model_src folder
- All the notebooks we used are in the root folder
- Here is our presentation video link (you need a UCSD email account to access it): https://drive.google.com/file/d/1dHjMl5QP530RDQmXeosVvtFcKUR_e9LG/view?usp=drive_link
- I also uploaded our code to my google drive if this repo link thing does not work.