Skip to content

Delphboy/superpixel-features

Repository files navigation

Superpixel Features

Generate features for superpixels and patches using pretrained models. Features are saved as .npz files with keys for the features ('feats'), superpixel bounding box ('bbox'), and region adjacency edges ('rag').

In terms of feature space, the code supports ResNet, CLIP, BLIPv2, and SigLIP features. Currently, only SLIC and Watershed superpixel segmentation algorithms are implemented (via scikit-image) however, patching is also supported (but not with the --rag flag).

To generate a collection of superpixels for the COCO dataset, see runner.sh for an example of how this can be achieved.

For compatiblity with the Karpathy Split of the COCO dataset, merge_and_clean.py is provided. This script will move and rename the superpixel feature files such that they can be used in place of the BUTD files in the original Karpathy Split JSON files.

Dependencies

conda env create -f environment.yml
conda create --name sp python=3.9
conda activate sp

python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python3 -m pip install torch_geometric

python3 -m pip install git+https://github.com/openai/CLIP.git
python3 -m pip install salesforce-lavis

python3 -m pip install -U timm==1.0.14
python3 -m pip install -U transformers==4.48.1

Parameters

Name Description
--image_dir The directory containnig image inputs
--save_dir The directory to save the npz files to
--feature_extractor Which model to use? [BLIP / CLIP / RESNET / SIGLIP / VIT]
--num_segments The number of superpixels to generate per image (Not compatible with --whole_img)
--segmenter Which superpixel algorithm to use? [SLIC / WATERSHED]
--whole_img (Flag) Generate a single feature for the whole image (Not compatible with --rag)
--patches (Flag) Generate patch features instead of superpixel features (Not compatible with --rag)
--rag (Flag) Generate the Region Adjacency Graph edges between superpixels

Warning

The --segmenter PATCHES flag will try to generate the number of segments give in --num-segments. However, this number must result in the patching kernel size being a whole number. i.e. $k = \sqrt{\frac{224 \times 224}{N}}$ must be an integer (where $N$ is --num_segments). See patcher.py for implementation details.

Feature Sizes

Model Feature Dim
ResNet 2048
CLIP 512
BLIPv2 768
SigLIP 1152
ViT (vit_b_16) 1000

Examples

Generate 25 Watershed superpixel CLIP features for the Karpathy Test Set with RAG edges

python3 main.py --image_dir "/home/hsenior/coco/img/test2014/" \
    --save_dir "/home/hsenior/coco/superpixel_features/" \
    --model_id "CLIP" \
    --num_segments 25 \
    --segmenter "WATERSHED" \
    --rag

Generate whole image ResNet features for the Karpathy Validation set

python3 main.py --image_dir "/home/hsenior/coco/img/val2014/" \
    --save_dir "/home/hsenior/coco/superpixel_features/" \
    --model_id "RESNET" \
    --whole_img

Generate 196 patch BLIP features for the Karpathy Train Set

python3 main.py --image_dir "/home/hsenior/coco/img/train2014/" \
    --save_dir "/home/hsenior/coco/superpixel_features/" \
    --model_id "BLIP" \
    --num_segments 196 \
    --segmenter "PATCHER" \

About

Generate features from superpixels

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published