MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

Official pytorch implementation of MFil-Mamba.

Abstract

State Space Models (SSMs), especially recent Mamba architecture, have achieved remarkable success in sequence modeling tasks. However, extending SSMs to computer vision remains challenging due to the non-sequential structure of visual data and its complex 2D spatial dependencies. Although several early studies have explored adapting selective SSMs for vision applications, most approaches primarily depend on employing various traversal strategies over the same input. This introduces redundancy and distorts the intricate spatial relationships within images. To address these challenges, we propose MFil-Mamba, a novel visual state space architecture built on a multi-filter scanning backbone. Unlike fixed multi-directional traversal methods, our design enables each scan to capture unique and contextually relevant spatial information while minimizing redundancy. Furthermore, we incorporate an adaptive weighting mechanism to effectively fuse outputs from multiple scans in addition to architectural enhancements. MFil-Mamba achieves superior performance over existing state-of-the-art models across various benchmarks that include image classification, object detection, instance segmentation, and semantic segmentation. For example, our tiny variant attains 83.2% top-1 accuracy on ImageNet-1K, 47.3% box AP and 42.7% mask AP on MS COCO, and 48.5% mIoU on the ADE20K dataset.

Results

Classification on ImageNet-1K

Name	Pretrain	Resolution	#Params	FLOPs	Acc@1	Models
MFil-Mamba-Tiny	ImageNet-1K	224x224	33.5M	5.6G	83.2	ckpt
MFil-Mamba-Small	ImageNet-1K	224x224	50.6M	9.1G	83.9	ckpt
MFil-Mamba-Base	ImageNet-1K	224x224	93.1M	16.8G	84.2	ckpt

Usage

Installation

Clone the MFil-Mamba Github repo:

git clone https://github.com/puskal-khadka/MFil-Mamba
cd MFil-Mamba

Setup Environment

conda create -n mfilmamba python=3.11
conda activate mfilmamba

Install dependencies

pip install -r requirements.txt
pip install kernels/.

Dataset Preparation

We use the ImageNet-1k dataset and organize the downloaded files in the following directory structure:

imagenet1k
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img2.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img3.jpeg
    │   └── ...
    ├── class2
    │   ├── img4.jpeg
    │   └── ...
    └── ...

Training

To train MFil-Mamba for ImageNet-1k classification task:

  bash dist_train.sh <model_variant> <total_gpus> /path/to/output --data-path /path/to/imagenet --input-size 224 --batch-size <batch_per_gpu> --epochs 300

<model_variant> is the name of model variant, i.e, mfil_tiny, mfil_small or mfil_base

Evaluation

To evaluate a pre-trained model on ImageNet-1k val:

bash dist_train.sh <model_variant> 1 /path/to/output --resume /path/to/checkpoint_file --data-path /path/to/imagenet --input-size 224 --eval

Acknowledgment

Our codeis based on Mamba, VMamba and OpenMMLab. Thanks for their amazing works.

Citation

@article{Khadka2026mfil,
  title={MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models},
  author={Puskal Khadka and KC Santosh},
  journal={arXiv preprint arXiv:2603.20074},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
classification		classification
kernels		kernels
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

Abstract

Results

Classification on ImageNet-1K

Usage

Installation

Dataset Preparation

Training

Evaluation

Acknowledgment

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

Abstract

Results

Classification on ImageNet-1K

Usage

Installation

Dataset Preparation

Training

Evaluation

Acknowledgment

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages