Skip to content

ZrW00/GraCeFul

Repository files navigation

GraCeFul

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining

arXiv

This repository is the code implementation of our paper in COLING 2025:

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining

Dependencies

  • Install requirements. The code implementation of Gracefully is partially based on MuScleLoRA and Openbackdoor. After cloning this repository, you can install the requirements by:
    pip3 install -r requirements.txt

Reproduce the results

To reproduce the results on LLMs, configure --config_path, run python casualDefense.py for baselines and GraCeFul, and run python casualCleanTuning.py for clean-tuning.

Detailed arguments setting:

python casualDefense.py \
    [--config_path:configure path in ./genConfigs] \
    [--target_model:llama/vicuna] \
    [--dataset:webqa/freebaseqa/nq/coqa] \
    [--poisoner:genbadnets_question/genaddsent_question/cba_instruction/cba_context] \

For CUBE and GraCeful, the visualization results on the feature distributions will be saved on ./casualCube and ./graceful, respectively.

Acknowledgement

This work can not be done without the help of the following repos:

Following MuScleLoRA, we continue to extend OpenBackdoor to generative LLMs.

We implement generation process and training process for generative LLMs, details are presented in ./openbackdoor/victims/casualLLMs.py and ./openbackdoor/trainers/casual_trainers.py.

For baselines, CleanGen and DeCE are implemented in ./openbackdoor/trainers/casual_cleangen_trainer.py and ./openbackdoor/trainers/casual_dece_trainers.py, respectively. CUBE and MuScleLoRA for generation tasks is implemented in ./openbackdoor/defenders/cube_defender.py and ./openbackdoor/trainers/casual_ga_trainer.py, respectively.

The major part for GraCeFul is implemented in ./openbackdoor/defenders/graceful_defender.py.

Citation

@inproceedings{wu2025gracefully,
  title   = {Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining},
  author  = {Wu, Zongru and Cheng, Pengzhou and Fang, Lingyong and Zhang, Zhuosheng and Liu, Gongshen},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025)},
  year    = {2025},
  pages = {3267--3282}
}

About

The code implementation of GraCeFul

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages