Description
Guardrails doesn't prevent prompt injection attacks. Malicious prompts override safety guidelines and generate harmful content.
Expected Behavior
Implement prompt structure validation. Detect and reject injection attempts.
Actual Behavior
File: src/rails/guardrails.py
No injection detection: llm.generate(user_prompt)
GSSoC Points Estimate: Level 3 (Security/LLM)
Suggested Labels
- gssoc:approved
- type:bug
- severity:critical
- area:security
Description
Guardrails doesn't prevent prompt injection attacks. Malicious prompts override safety guidelines and generate harmful content.
Expected Behavior
Implement prompt structure validation. Detect and reject injection attempts.
Actual Behavior
File: src/rails/guardrails.py
No injection detection: llm.generate(user_prompt)
GSSoC Points Estimate: Level 3 (Security/LLM)
Suggested Labels