Skip to content

Secrets-plugin: redact fails when redact-mode=all and more than 1 secret in prompt causing leak of secret #305

@mariovitale1979

Description

@mariovitale1979

Describe the bug

When using the Secrets plugin in 'all' redact-mode AND using a prompt with more than 1 secret, the sanitized prompt will still contain clear text secrets.

To Reproduce

Using the LLM Guard playgroun (https://huggingface.co/spaces/protectai/llm-guard-playground) , or llm-guard library or the Swagger API, call the "scan_prompt" function or the /analyze/prompt API call with a prompt like this:
"This is a prompt example containing fake 2 secrets, ex sk-ZYX987654321abcABC12T3BlbkFJqwertyyiop1234567890 and ghp_d2PHL8sMWUfBPZVNvHtAr4co4Zfy2Y3RB42O"

The sanitized prompt will look like this:
This is a prompt example containing fake 2 secrets, ex ****** and ghp_d2PHL8sMWUfBPZVNvHtAr4co4Zfy2Y3RB42O************ and ghp_d2PHL8sMWUfBPZVNvHtAr4co4Zfy2Y3RB42O

The 1st secret is correctly redacted but the 2nd is not. Even worse, it's still in cleartext and even duplicated somehow

Expected behavior

The expected behaviour is that all secrets should be redacted no matter if there is 1 or multiple in a prompt or output.

Screenshots

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions