Skip to content

USD-AI-ResearchLab/MFil-Mamba

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

Official pytorch implementation of MFil-Mamba.

Paper

Abstract

State Space Models (SSMs), especially recent Mamba architecture, have achieved remarkable success in sequence modeling tasks. However, extending SSMs to computer vision remains challenging due to the non-sequential structure of visual data and its complex 2D spatial dependencies. Although several early studies have explored adapting selective SSMs for vision applications, most approaches primarily depend on employing various traversal strategies over the same input. This introduces redundancy and distorts the intricate spatial relationships within images. To address these challenges, we propose MFil-Mamba, a novel visual state space architecture built on a multi-filter scanning backbone. Unlike fixed multi-directional traversal methods, our design enables each scan to capture unique and contextually relevant spatial information while minimizing redundancy. Furthermore, we incorporate an adaptive weighting mechanism to effectively fuse outputs from multiple scans in addition to architectural enhancements. MFil-Mamba achieves superior performance over existing state-of-the-art models across various benchmarks that include image classification, object detection, instance segmentation, and semantic segmentation. For example, our tiny variant attains 83.2% top-1 accuracy on ImageNet-1K, 47.3% box AP and 42.7% mask AP on MS COCO, and 48.5% mIoU on the ADE20K dataset.

Results

Classification on ImageNet-1K

Name Pretrain Resolution #Params FLOPs Acc@1 Models
MFil-Mamba-Tiny ImageNet-1K 224x224 33.5M 5.6G 83.2 ckpt
MFil-Mamba-Small ImageNet-1K 224x224 50.6M 9.1G 83.9 ckpt
MFil-Mamba-Base ImageNet-1K 224x224 93.1M 16.8G 84.2 ckpt

Usage

Installation

Clone the MFil-Mamba Github repo:

git clone https://github.com/puskal-khadka/MFil-Mamba
cd MFil-Mamba

Setup Environment

conda create -n mfilmamba python=3.11
conda activate mfilmamba

Install dependencies

pip install -r requirements.txt
pip install kernels/.

Dataset Preparation

We use the ImageNet-1k dataset and organize the downloaded files in the following directory structure:

imagenet1k
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img2.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img3.jpeg
    │   └── ...
    ├── class2
    │   ├── img4.jpeg
    │   └── ...
    └── ...

Training

To train MFil-Mamba for ImageNet-1k classification task:

  bash dist_train.sh <model_variant> <total_gpus> /path/to/output --data-path /path/to/imagenet --input-size 224 --batch-size <batch_per_gpu> --epochs 300

<model_variant> is the name of model variant, i.e, mfil_tiny, mfil_small or mfil_base

Evaluation

To evaluate a pre-trained model on ImageNet-1k val:

bash dist_train.sh <model_variant> 1 /path/to/output --resume /path/to/checkpoint_file --data-path /path/to/imagenet --input-size 224 --eval

Acknowledgment

Our codeis based on Mamba, VMamba and OpenMMLab. Thanks for their amazing works.

Citation

@article{Khadka2026mfil,
  title={MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models},
  author={Puskal Khadka and KC Santosh},
  journal={arXiv preprint arXiv:2603.20074},
  year={2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 81.8%
  • Cuda 12.6%
  • C++ 5.1%
  • Other 0.5%