Official pytorch implementation of MFil-Mamba.
State Space Models (SSMs), especially recent Mamba architecture, have achieved remarkable success in sequence modeling tasks. However, extending SSMs to computer vision remains challenging due to the non-sequential structure of visual data and its complex 2D spatial dependencies. Although several early studies have explored adapting selective SSMs for vision applications, most approaches primarily depend on employing various traversal strategies over the same input. This introduces redundancy and distorts the intricate spatial relationships within images. To address these challenges, we propose MFil-Mamba, a novel visual state space architecture built on a multi-filter scanning backbone. Unlike fixed multi-directional traversal methods, our design enables each scan to capture unique and contextually relevant spatial information while minimizing redundancy. Furthermore, we incorporate an adaptive weighting mechanism to effectively fuse outputs from multiple scans in addition to architectural enhancements. MFil-Mamba achieves superior performance over existing state-of-the-art models across various benchmarks that include image classification, object detection, instance segmentation, and semantic segmentation. For example, our tiny variant attains 83.2% top-1 accuracy on ImageNet-1K, 47.3% box AP and 42.7% mask AP on MS COCO, and 48.5% mIoU on the ADE20K dataset.
| Name | Pretrain | Resolution | #Params | FLOPs | Acc@1 | Models |
|---|---|---|---|---|---|---|
| MFil-Mamba-Tiny | ImageNet-1K | 224x224 | 33.5M | 5.6G | 83.2 | ckpt |
| MFil-Mamba-Small | ImageNet-1K | 224x224 | 50.6M | 9.1G | 83.9 | ckpt |
| MFil-Mamba-Base | ImageNet-1K | 224x224 | 93.1M | 16.8G | 84.2 | ckpt |
Clone the MFil-Mamba Github repo:
git clone https://github.com/puskal-khadka/MFil-Mamba
cd MFil-MambaSetup Environment
conda create -n mfilmamba python=3.11
conda activate mfilmambaInstall dependencies
pip install -r requirements.txt
pip install kernels/.
We use the ImageNet-1k dataset and organize the downloaded files in the following directory structure:
imagenet1k
├── train
│ ├── class1
│ │ ├── img1.jpeg
│ │ └── ...
│ ├── class2
│ │ ├── img2.jpeg
│ │ └── ...
│ └── ...
└── val
├── class1
│ ├── img3.jpeg
│ └── ...
├── class2
│ ├── img4.jpeg
│ └── ...
└── ...
To train MFil-Mamba for ImageNet-1k classification task:
bash dist_train.sh <model_variant> <total_gpus> /path/to/output --data-path /path/to/imagenet --input-size 224 --batch-size <batch_per_gpu> --epochs 300<model_variant> is the name of model variant, i.e, mfil_tiny, mfil_small or mfil_base
To evaluate a pre-trained model on ImageNet-1k val:
bash dist_train.sh <model_variant> 1 /path/to/output --resume /path/to/checkpoint_file --data-path /path/to/imagenet --input-size 224 --evalOur codeis based on Mamba, VMamba and OpenMMLab. Thanks for their amazing works.
@article{Khadka2026mfil,
title={MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models},
author={Puskal Khadka and KC Santosh},
journal={arXiv preprint arXiv:2603.20074},
year={2026}
}