In our project, a total of 7 models are used. The versions and download links/methods for each model are as follows:
- anygrasp: when you get anygrasp license from here, it will provid checkpoint for you.
- bert-base-uncased: https://huggingface.co/google-bert/bert-base-uncased
- CLIP-ViT-H-14-laion2B-s32B-b79K: https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
- droid-slam: https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
- GroundingDINO: https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth and https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
- recognize_anything: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
- segment-anything-2: https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints
You should organize the checkpoints as follows:
DovSG/
├──checkpoints
│ ├──anygrasp
│ ├──bert-base-uncased
│ ├──CLIP-ViT-H-14-laion2B-s32B-b79K
│ ├──droid-slam
│ ├──GroundingDINO
│ ├──recognize_anything
│ └──segment-anything-2
├──license # anygrasp license
...