This repository is the code implementation of our paper in COLING 2025:
Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
- Install requirements. The code implementation of Gracefully is partially based on MuScleLoRA and Openbackdoor. After cloning this repository, you can install the requirements by:
pip3 install -r requirements.txt
- Training Data. We provide the backdoored training data in ./poison_data and raw datasets in ./datasets/QuestionAnswering.
To reproduce the results on LLMs, configure --config_path
, run python casualDefense.py
for baselines and GraCeFul, and run python casualCleanTuning.py
for clean-tuning.
Detailed arguments setting:
python casualDefense.py \
[--config_path:configure path in ./genConfigs] \
[--target_model:llama/vicuna] \
[--dataset:webqa/freebaseqa/nq/coqa] \
[--poisoner:genbadnets_question/genaddsent_question/cba_instruction/cba_context] \
For CUBE and GraCeful, the visualization results on the feature distributions will be saved on ./casualCube and ./graceful, respectively.
This work can not be done without the help of the following repos:
- MuScleLoRA: https://github.com/ZrW00/MuScleLoRA
- OpenBackdoor: https://github.com/thunlp/OpenBackdoor
- PEFT: https://github.com/huggingface/peft
Following MuScleLoRA, we continue to extend OpenBackdoor to generative LLMs.
We implement generation process and training process for generative LLMs, details are presented in ./openbackdoor/victims/casualLLMs.py and ./openbackdoor/trainers/casual_trainers.py.
For baselines, CleanGen and DeCE are implemented in ./openbackdoor/trainers/casual_cleangen_trainer.py and ./openbackdoor/trainers/casual_dece_trainers.py, respectively. CUBE and MuScleLoRA for generation tasks is implemented in ./openbackdoor/defenders/cube_defender.py and ./openbackdoor/trainers/casual_ga_trainer.py, respectively.
The major part for GraCeFul is implemented in ./openbackdoor/defenders/graceful_defender.py.
@inproceedings{wu2025gracefully,
title = {Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining},
author = {Wu, Zongru and Cheng, Pengzhou and Fang, Lingyong and Zhang, Zhuosheng and Liu, Gongshen},
booktitle = {Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025)},
year = {2025},
pages = {3267--3282}
}