This repository was archived by the owner on May 16, 2025. It is now read-only.

Description
Hello Rebuff team,
I have developed a small open-source tool called Puppetry Detector. It detects policy puppetry and prompt injection attempts in LLM prompts using regular expressions. The tool is modular and already includes integration with Rebuff, so it can be adapted as an optional heuristic module.
If somebody gives it a look an confirms possibility of integration, I'd be happy.
I would be happy to prepare a pull request to add this as an optional feature, if you think it could be useful. Please let me know if you are open to this idea or if you have any suggestions before I make a PR.
Thank you for your time and for your great work on Rebuff!
The tool itself is here: https://github.com/metawake/puppetry-detector