GraCeFul

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining

This repository is the code implementation of our paper in COLING 2025:

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining

Dependencies

Install requirements. The code implementation of Gracefully is partially based on MuScleLoRA and Openbackdoor. After cloning this repository, you can install the requirements by:

    pip3 install -r requirements.txt

Training Data. We provide the backdoored training data in ./poison_data and raw datasets in ./datasets/QuestionAnswering.

Reproduce the results

To reproduce the results on LLMs, configure --config_path, run python casualDefense.py for baselines and GraCeFul, and run python casualCleanTuning.py for clean-tuning.

Detailed arguments setting:

python casualDefense.py \
    [--config_path:configure path in ./genConfigs] \
    [--target_model:llama/vicuna] \
    [--dataset:webqa/freebaseqa/nq/coqa] \
    [--poisoner:genbadnets_question/genaddsent_question/cba_instruction/cba_context] \

For CUBE and GraCeful, the visualization results on the feature distributions will be saved on ./casualCube and ./graceful, respectively.

Acknowledgement

This work can not be done without the help of the following repos:

MuScleLoRA: https://github.com/ZrW00/MuScleLoRA
OpenBackdoor: https://github.com/thunlp/OpenBackdoor
PEFT: https://github.com/huggingface/peft

Following MuScleLoRA, we continue to extend OpenBackdoor to generative LLMs.

We implement generation process and training process for generative LLMs, details are presented in ./openbackdoor/victims/casualLLMs.py and ./openbackdoor/trainers/casual_trainers.py.

For baselines, CleanGen and DeCE are implemented in ./openbackdoor/trainers/casual_cleangen_trainer.py and ./openbackdoor/trainers/casual_dece_trainers.py, respectively. CUBE and MuScleLoRA for generation tasks is implemented in ./openbackdoor/defenders/cube_defender.py and ./openbackdoor/trainers/casual_ga_trainer.py, respectively.

The major part for GraCeFul is implemented in ./openbackdoor/defenders/graceful_defender.py.

Citation

@inproceedings{wu2025gracefully,
  title   = {Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining},
  author  = {Wu, Zongru and Cheng, Pengzhou and Fang, Lingyong and Zhang, Zhuosheng and Liu, Gongshen},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025)},
  year    = {2025},
  pages = {3267--3282}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraCeFul

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining

Dependencies

Reproduce the results

Acknowledgement

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
genConfigs		genConfigs
openbackdoor		openbackdoor
poison_data		poison_data
GraceFul.png		GraceFul.png
LICENSE		LICENSE
README.md		README.md
casualCleanTuning.py		casualCleanTuning.py
casualDefense.py		casualDefense.py
requirements.txt		requirements.txt
setup.py		setup.py

License

ZrW00/GraCeFul

Folders and files

Latest commit

History

Repository files navigation

GraCeFul

Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining

Dependencies

Reproduce the results

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages