Skip to content

Commit 08569c8

Browse files
authored
feat(community): add Prompt Security integration (#920)
1 parent 2ede0f4 commit 08569c8

File tree

10 files changed

+403
-0
lines changed

10 files changed

+403
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,5 @@ docs/**/config
6363
# Ignoring log files generated by tests
6464
firebase.json
6565
scratch.py
66+
67+
.env
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Prompt Security Integration
2+
3+
[Prompt Security AI](https://prompt.security/?utm_medium=github&utm_campaign=nemo-guardrails) allows you to protect LLM interaction. This integration enables NeMo Guardrails to use Prompt Security to protect input and output flows.
4+
5+
You'll need to set the following env variables to work with Prompt Security:
6+
7+
1. PS_PROTECT_URL - This is the URL of the protect endpoint given by Prompt Security. This will look like https://[REGION].prompt.security/api/protect where REGION is eu, useast or apac
8+
2. PS_APP_ID - This is the application ID given by Prompt Security (similar to an API key). You can get it from admin portal at https://[REGION].prompt.security/ where REGION is eu, useast or apac
9+
10+
## Setup
11+
12+
1. Ensure that you have access to Prompt Security API server (SaaS or on-prem).
13+
2. Update your `config.yml` file to include the Private AI settings:
14+
15+
```yaml
16+
rails:
17+
input:
18+
flows:
19+
- protect prompt
20+
output:
21+
flows:
22+
- protect response
23+
```
24+
25+
Don't forget to set the `PS_PROTECT_URL` and `PS_APP_ID` environment variables.
26+
27+
## Usage
28+
29+
Once configured, the Prompt Security integration will automatically:
30+
31+
1. Protect prompts before they are processed by the LLM.
32+
2. Protect LLM outputs before they are sent back to the user.
33+
34+
The `protect_text` action in `nemoguardrails/library/prompt_security/actions.py` handles the protection process.
35+
36+
## Error Handling
37+
38+
If the Prompt Security API request fails, it's operating in a fail-open mode (not blocking the prompt/response).
39+
40+
## Notes
41+
42+
For more information on Prompt Security and capabilities, please refer to the [Prompt Security documentation](https://prompt.security/?utm_medium=github&utm_campaign=nemo-guardrails).

docs/user-guides/guardrails-library.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ NeMo Guardrails comes with a library of built-in guardrails that you can easily
2323
- [Cleanlab Trustworthiness Score](#cleanlab)
2424
- [GCP Text Moderation](#gcp-text-moderation)
2525
- [Private AI PII detection](#private-ai-pii-detection)
26+
- [Prompt Security Protection](#prompt-security-protection)
2627
- OpenAI Moderation API - *[COMING SOON]*
2728

2829
4. Other
@@ -805,6 +806,27 @@ rails:
805806

806807
For more details, check out the [Private AI Integration](./community/privateai.md) page.
807808

809+
810+
### Prompt Security Protection
811+
812+
NeMo Guardrails supports using [Prompt Security API](https://prompt.security/?utm_medium=github&utm_campaign=nemo-guardrails) for protecting input and output retrieval flows.
813+
814+
To activate the protection, you need to set the `PS_PROTECT_URL` and `PS_APP_ID` environment variables.
815+
816+
#### Example usage
817+
818+
```yaml
819+
rails:
820+
input:
821+
flows:
822+
- protect prompt
823+
output:
824+
flows:
825+
- protect response
826+
```
827+
828+
For more details, check out the [Prompt Security Integration](./community/prompt_security.md) page.
829+
808830
## Other
809831

810832
### Jailbreak Detection Heuristics
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Prompt Security Configuration Example
2+
3+
This example contains configuration files for using Prompt Security in your NeMo Guardrails project.
4+
5+
For more details on the Prompt Security integration, see [Prompt Security Integration User Guide](../../../docs/user-guides/community/prompt-security.md).
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
models:
2+
- type: main
3+
engine: openai
4+
model: gpt-4o
5+
6+
rails:
7+
input:
8+
flows:
9+
- protect prompt
10+
11+
output:
12+
flows:
13+
- protect response
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""Prompt/Response protection using Prompt Security."""
17+
18+
import logging
19+
import os
20+
from typing import Optional
21+
22+
import httpx
23+
24+
from nemoguardrails.actions import action
25+
26+
log = logging.getLogger(__name__)
27+
28+
29+
async def ps_protect_api_async(
30+
ps_protect_url: str,
31+
ps_app_id: str,
32+
prompt: Optional[str] = None,
33+
system_prompt: Optional[str] = None,
34+
response: Optional[str] = None,
35+
user: Optional[str] = None,
36+
):
37+
"""Calls Prompt Security Protect API asynchronously.
38+
39+
Args:
40+
ps_protect_url: the URL of the protect endpoint given by Prompt Security.
41+
URL is https://[REGION].prompt.security/api/protect where REGION is eu, useast or apac
42+
43+
ps_app_id: the application ID given by Prompt Security (similar to an API key).
44+
Get it from the admin portal at https://[REGION].prompt.security/ where REGION is eu, useast or apac
45+
46+
prompt: the user message to protect.
47+
48+
system_prompt: the system message for context.
49+
50+
response: the bot message to protect.
51+
52+
user: the user ID or username for context.
53+
54+
Returns:
55+
A dictionary with the following items:
56+
- is_blocked: True if the text should be blocked, False otherwise.
57+
- is_modified: True if the text should be modified, False otherwise.
58+
- modified_text: The modified text if is_modified is True, None otherwise.
59+
"""
60+
61+
headers = {
62+
"APP-ID": ps_app_id,
63+
"Content-Type": "application/json",
64+
}
65+
payload = {
66+
"prompt": prompt,
67+
"system_prompt": system_prompt,
68+
"response": response,
69+
"user": user,
70+
}
71+
async with httpx.AsyncClient() as client:
72+
modified_text = None
73+
ps_action = "log"
74+
try:
75+
ret = await client.post(ps_protect_url, headers=headers, json=payload)
76+
res = ret.json()
77+
ps_action = res.get("result", {}).get("action", "log")
78+
if ps_action == "modify":
79+
key = "response" if response else "prompt"
80+
modified_text = res.get("result", {}).get(key, {}).get("modified_text")
81+
except Exception as e:
82+
log.error("Error calling Prompt Security Protect API: %s", e)
83+
return {
84+
"is_blocked": ps_action == "block",
85+
"is_modified": ps_action == "modify",
86+
"modified_text": modified_text,
87+
}
88+
89+
90+
@action(is_system_action=True)
91+
async def protect_text(
92+
user_prompt: Optional[str] = None, bot_response: Optional[str] = None
93+
):
94+
"""Protects the given user_prompt or bot_response.
95+
Args:
96+
user_prompt: The user message to protect.
97+
bot_response: The bot message to protect.
98+
Returns:
99+
A dictionary with the following items:
100+
- is_blocked: True if the text should be blocked, False otherwise.
101+
- is_modified: True if the text should be modified, False otherwise.
102+
- modified_text: The modified text if is_modified is True, None otherwise.
103+
Raises:
104+
ValueError is returned in one of the following cases:
105+
1. If PS_PROTECT_URL env variable is not set.
106+
2. If PS_APP_ID env variable is not set.
107+
3. If no user_prompt and no bot_response is provided.
108+
"""
109+
110+
ps_protect_url = os.getenv("PS_PROTECT_URL")
111+
if not ps_protect_url:
112+
raise ValueError("PS_PROTECT_URL env variable is required for Prompt Security.")
113+
114+
ps_app_id = os.getenv("PS_APP_ID")
115+
if not ps_app_id:
116+
raise ValueError("PS_APP_ID env variable is required for Prompt Security.")
117+
118+
if bot_response:
119+
return await ps_protect_api_async(
120+
ps_protect_url, ps_app_id, None, None, bot_response
121+
)
122+
123+
if user_prompt:
124+
return await ps_protect_api_async(ps_protect_url, ps_app_id, user_prompt)
125+
126+
raise ValueError("Neither user_message nor bot_message was provided")
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# INPUT RAILS
2+
3+
@active
4+
flow protect prompt
5+
"""Check if the prompt is valid according to Prompt Security."""
6+
$result = await protect_text(user_prompt=$user_message)
7+
if $result["is_blocked"]
8+
bot inform answer unknown
9+
stop
10+
else if $result["is_modified"]
11+
$user_message = $result["modified_text"]
12+
13+
14+
# OUTPUT RAILS
15+
16+
@active
17+
flow protect response
18+
"""Check if the response is valid according to Prompt Security."""
19+
$result = await protect_text(bot_response=$bot_message)
20+
if $result["is_blocked"]
21+
bot inform answer unknown
22+
stop
23+
else if $result["is_modified"]
24+
$bot_message = $result["modified_text"]
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# INPUT RAILS
2+
3+
define subflow protect prompt
4+
"""Check if the prompt is valid according to Prompt Security."""
5+
$result = execute protect_text(user_prompt=$user_message)
6+
if $result["is_blocked"]
7+
bot inform answer unknown
8+
stop
9+
else if $result["is_modified"]
10+
$user_message = $result["modified_text"]
11+
12+
13+
# OUTPUT RAILS
14+
15+
define subflow protect response
16+
"""Check if the response is valid according to Prompt Security."""
17+
$result = execute protect_text(bot_response=$bot_message)
18+
if $result["is_blocked"]
19+
bot inform answer unknown
20+
stop
21+
else if $result["is_modified"]
22+
$bot_message = $result["modified_text"]

0 commit comments

Comments
 (0)