IntelLabs
diff --git a/‎MS/README.md
Lines changed: 138 additions & 0 deletions b/‎MS/README.md
Lines changed: 138 additions & 0 deletions
diff --git a/‎MS/eval.py
Lines changed: 45 additions & 0 deletions b/‎MS/eval.py
Lines changed: 45 additions & 0 deletions
diff --git a/‎MS/extract/README.md
Lines changed: 21 additions & 0 deletions b/‎MS/extract/README.md
Lines changed: 21 additions & 0 deletions
diff --git a/‎MS/extract/extract_mamba.py
Lines changed: 88 additions & 0 deletions b/‎MS/extract/extract_mamba.py
Lines changed: 88 additions & 0 deletions
@@ -0,0 +1,138 @@
+# Mamba-Shedder
+
+Official implementation of [Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models]().
+
+This repo contains the code for Mamba-Shedder, which explores the compression of the new Mamba-series architectures (and their hybrids). 
+We study the sensitivity of these models to the removal of selected components at different granularities to reduce model size and computational overhead, thereby improving their efficiency while maintaining accuracy.
+Please refer to our paper for more details.
+
+## News
+- **[2025.01.23]** Support for the new hybrid architecture model **Hymba**, please refer to [Hymba-Pruning](./hybrid/Hymba-Pruning).
+- **[2025.01.23]** Support Zamba2 ([Zamba2-Pruning](./hybrid/Zamba2-Pruning)).
+- **[2025.01.22]** Release the code for **Mamba-Shedder**. :tada:
+
+## Released Pruned Models 🤗
+
+Compressed models by Mamba-Shedder:
+
+| Source Model                                                       | Components Removed | Recovery Tuning | Relative Acc. | Pruned Model Link                                                                      | Inference Speedup |
+|--------------------------------------------------------------------|--------------------|-----------------|---------------|----------------------------------------------------------------------------------------|-------------------|
+| [Hymba-1.5B-Base](https://huggingface.co/nvidia/Hymba-1.5B-Base)   | 7 Hymba Blocks     | ✘               | 97%           | [Link]()                 | ~1.2x             |
+| [Hymba-1.5B-Base](https://huggingface.co/nvidia/Hymba-1.5B-Base)   | 7 Hymba Blocks     | ✔               | 99%           | [Link]()          | ~1.2x             |
+| [mamba-2.8b](https://huggingface.co/state-spaces/mamba-2.8b)       | 14 Mamba Blocks    | ✘               | 90%           | [Link]()                      | ~1.3x             |
+| [mamba2-2.7b](https://huggingface.co/state-spaces/mamba2-2.7b)     | 22 SSMs            | ✘               | 96%           | [Link]()        | ~1.2x             |
+| [mamba2-2.7b](https://huggingface.co/state-spaces/mamba2-2.7b)     | 22 SSMs            | ✔               | 99%           | [Link]() | ~1.2x             |
+
+## Setup
+
+Use the following instructions to create a virtual environment with the required dependencies.
+
+```
+# install dependencies
+bash install.sh
+```
+
+## Run
+
+### Evaluation before Pruning
+
+```bash
+python eval.py --model_path <path to mamba model>
+```
+
+### Prune
+
+#### Mamba Block Pruning
+
+An example command for [mamba-2.8b](https://huggingface.co/state-spaces/mamba-2.8b) with Mamba Block Pruning:
+
+```bash
+python prune.py \
+  --model_path state-spaces/mamba-2.8b \
+  --do_prune \
+  --output_path <path to pruning results> \
+  --prune_target mamba_block \
+  --target_pruning_steps 10 \
+  --importance_metric ppl \
+  --calibration_dataset alpaca \
+  --num_calibration_samples 256 \
+  --do_eval
+```
+
+- `model_path`: Path to the pre-trained Mamba model.
+- `do_prune`: Flag to indicate whether to perform pruning.
+- `output_path`: Directory to save the pruning and evaluation results.
+- `prune_target`: "mamba_block" or "ssm".
+- `target_pruning_steps`: Number of pruning target modules (mamba blocks or SSMs).
+- `importance_metric`: Metric for calculating block importance, currently only supports PPL.
+- `calibration_dataset`: Calibration dataset name ("alpaca", "c4", "ptb" or "wikitext2").
+- `num_calibration_samples`: Number of calibration samples for pruning.
+- `do_eval`: Flag to indicate whether to perform evaluation.
+
+#### SSM Pruning
+
+An example command for [mamba2-2.7b](https://huggingface.co/state-spaces/mamba2-2.7b) with SSM Pruning:
+
+```bash
+python prune.py \
+  --model_path state-spaces/mamba2-2.7b \
+  --do_prune \
+  --output_path <path to pruning results> \
+  --prune_target ssm \
+  --target_pruning_steps 20 \
+  --importance_metric ppl \
+  --calibration_dataset alpaca \
+  --num_calibration_samples 256 \
+  --do_eval
+```
+
+### Extract the Pruned Model
+
+Extract the pruned model based on the optimal pruning configuration obtained from Mamba-Shedder. 
+For more details, please refer to [here](./extract). 
+Here is an example to extract a pruned [mamba2-2.7b](https://huggingface.co/state-spaces/mamba2-2.7b):
+
+```bash
+python extract/extract_mamba.py \
+  --model_path state-spaces/mamba2-2.7b \
+  --pruned_model_config_file <path to pruning results>/pruning_config.json \
+  --output_path <path to compressed model>
+```
+
+### Recovery Fine-tuning
+
+After we have obtained the pruned model, we can use [Alpaca](https://huggingface.co/datasets/yahma/alpaca-cleaned) dataset for recovery fine-tuning:
+
+```bash
+# Finetune the compressed Mamba-2
+python recovery/finetune_mamba.py \
+    --model_path <path to compressed model> \
+    --do_train \
+    --batch_size 32 \
+    --gradient_accumulation_steps 1 \
+    --num_train_epochs 1 \
+    --learning_rate 5e-5 \
+    --output_path <path to trained model> \
+    --do_eval
+```
+
+## Results
+
+All run commands and pruning results can be found in [here](./results).
+
+### Loading the compressed model for evaluation
+
+```bash
+python eval.py --model_path <path to compressed model>
+```
+
+## Citation
+If you find Mamba-Shedder's code and paper helpful, please kindly cite:
+```bibtex
+@article{munoz2025mambashedder,
+  title = {Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models},
+  author = {J. Pablo Munoz and Jinjie Yuan and Nilesh Jain},
+  journal = {},
+  year = {2025}
+}
+```
@@ -0,0 +1,45 @@
+import argparse
+import json
+import logging
+import torch
+
+from transformers import AutoTokenizer
+from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
+
+from lm_eval import evaluator
+from lm_eval.models.mamba_lm import MambaLMWrapper
+
+TASKS = ["lambada_openai", "hellaswag", "piqa", "arc_easy", "arc_challenge", "winogrande", "openbookqa"]
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model_path",
+        type=str,
+    )
+    args = parser.parse_args()
+    model_path = args.model_path
+
+    tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
+    model = MambaLMHeadModel.from_pretrained(model_path, device="cuda", dtype=torch.float16)
+    model.device = model.lm_head.weight.device
+    lm = MambaLMWrapper(pretrained=model, tokenizer=tokenizer, batch_size=64)
+
+    # Evaluate on selected tasks
+    logging.info(f"Selected Tasks: {TASKS}")
+    results = evaluator.simple_evaluate(lm, tasks=TASKS, log_samples=False)['results']
+
+    metric_vals = {}
+    for task, result in results.items():
+        # TODO: fix (all are `acc_norm,none`)
+        res = result['acc,none'] if task == 'arc_easy' else result.get('acc_norm,none', result['acc,none'])
+        metric_vals[task] = round(res, 3) * 100
+        if task == "lambada_openai":
+            metric_vals[task + "_ppl"] = result['perplexity,none']
+
+    logging.info(json.dumps(metric_vals, indent=4))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,21 @@
+## Extract the Compressed Model from Mamba-Shedder
+
+The final compressed model can be extracted based on the optimal pruning configuration obtained from Mamba-Shedder.
+
+```bash
+# Mamba-1 (Mamba Block Pruning)
+python extract/extract_mamba.py \
+  --model_path state-spaces/mamba-2.8b \
+  --output_path <path to pruned model> \
+  --pruned_model_config_file <path to pruning result>/pruning_config.json # Or specify the config file of a pruning step from the `pruned_model_configs` folder, e.g., <path to pruning result>/pruned_model_configs/config.mamba_block.${eval_step}.json
+  
+# Mamba-2 (SSM Pruning)
+python extract/extract_mamba.py \
+  --model_path state-spaces/mamba2-2.7b \ 
+  --output_path <path to pruned model> \
+  --pruned_model_config_file <path to pruning result>/pruning_config.json # Or specify the config file of a pruning step from the `pruned_model_configs` folder, e.g., <path to pruning result>/pruned_model_configs/config.ssm.${eval_step}.json
+  ```
+
+- `model_path`: Path to the pre-trained model.
+- `pruned_model_config_file`: JSON file for the pruned model configuration.
+- `output_path`: Directory to save the compressed model.
@@ -0,0 +1,88 @@
+import argparse
+import json
+import logging
+import os
+import torch
+
+from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
+from transformers import AutoTokenizer
+
+
+MAMBA_MODULES = [
+    "backbone.layers.*.mixer.dt_bias",
+    "backbone.layers.*.mixer.A_log",
+    "backbone.layers.*.mixer.D",
+    "backbone.layers.*.mixer.in_proj.weight",
+    "backbone.layers.*.mixer.conv1d.weight",
+    "backbone.layers.*.mixer.conv1d.bias",
+    "backbone.layers.*.mixer.norm.weight",
+    "backbone.layers.*.mixer.out_proj.weight",
+    "backbone.layers.*.mixer.dt_proj.weight",   # Mamba-1
+    "backbone.layers.*.mixer.dt_proj.bias",   # Mamba-1
+    "backbone.layers.*.mixer.x_proj.weight",   # Mamba-1
+    "backbone.layers.*.norm.weight",
+]
+
+# only for Mamba-2
+SSM_MODULES = [
+    "backbone.layers.*.mixer.D",
+    "backbone.layers.*.mixer.dt_bias",
+]
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model_path",
+        type=str,
+        help="Path to the Mamba model."
+    )
+    parser.add_argument(
+        "--output_path",
+        type=str,
+        help="Directory to save the compressed model."
+    )
+    parser.add_argument(
+        "--pruned_model_config_file",
+        type=str,
+        help="Path to the pruned model configuration file."
+    )
+
+    args = parser.parse_args()
+    model_path = args.model_path
+    output_path = args.output_path
+    # Create output directory if it doesn't exist
+    os.makedirs(output_path, exist_ok=True)
+    pruned_model_config_file = args.pruned_model_config_file
+
+    # Load model and tokenizer
+    tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
+    model = MambaLMHeadModel.from_pretrained(model_path, device="cuda", dtype=torch.float16)
+
+    # Load pruning results
+    with open(pruned_model_config_file, "r") as f:
+        pruned_config = json.load(f)
+    logging.info(f"Detect a pruned model config: {pruned_config}")
+    state_dict = model.state_dict()
+
+    def prune_modules(state_dict, idx, module_names):
+        for module_name in module_names:
+            module_name = module_name.replace("*", str(idx))
+            if module_name in state_dict:
+                del state_dict[module_name]
+
+    if pruned_config.get("pruned_mamba_block_idx"):
+        pruned_mamba_block_idx = pruned_config["pruned_mamba_block_idx"]
+        for idx in pruned_mamba_block_idx:
+            prune_modules(state_dict, idx, MAMBA_MODULES)
+    if pruned_config.get("pruned_ssm_idx"):
+        pruned_ssm_idx = pruned_config["pruned_ssm_idx"]
+        for idx in pruned_ssm_idx:
+            prune_modules(state_dict, idx, SSM_MODULES)
+
+    model.save_pretrained(output_path, state_dict=state_dict)
+    tokenizer.save_pretrained(output_path)
+
+
+if __name__ == "__main__":
+    main()