diff --git a/articles/how_to_guardrail_in_realtime_api.md b/articles/how_to_guardrail_in_realtime_api.md new file mode 100644 index 0000000000..f26abc5dc8 --- /dev/null +++ b/articles/how_to_guardrail_in_realtime_api.md @@ -0,0 +1,48 @@ +# Recommended architecture for guardrails in Realtime API + +## Overview of Approach: + +One of the core challenges in Speech-to-Speech (S2S) implementations is that audio is inferred and streamed in real-time. This real-time nature limits the applicability of traditional guardrail techniques, where input guardrails can typically be evaluated in parallel, and output guardrails can be applied before delivering a response. In S2S systems, the process begins with inference directly on the audio input, after which the model immediately begins streaming its response. This document considers an approach for integrating guardrails into real-time S2S applications. + +We recommend thinking about guardrails in three categories: Prompt, Output, and Input. Each plays a distinct role in ensuring agent behavior remains aligned with policy and user expectations. + +![Overview of Guardrails](../images/Realtime_highlevel_guardrails.png) + +### Prompt Guardrails +**Definition:** Instructions (AKA system prompt, or developer prompt) to constrain and guide the agent’s behavior. These instructions should ideally be in the affirmative, explicitly stating what the agent should do for the user. + +**Example:** +>You are a friendly weather assistant. You only provide the user with temperature and visibility for a specified zip code. This is your only function. + +**Application:** +Prompt guardrails should be carefully engineered to define and restrict the agent’s scope. In most cases, these will serve as the primary enforcement mechanism for desired agent behavior. + +**Frequency:** +These are applied on every turn, as part of the model’s system or developer prompt. + +### Output Guardrails +**Definition:** Post-response checks that evaluate whether the model's output adheres to policy and avoids hallucination. These checks are implemented as separate model calls using the transcript of the agent’s spoken response. + +**Application:** +Because the S2S model emits audio in real time, the transcript must be streamed and buffered in small batches. These transcripts (e.g., `response.audio_transcript.delta`) can be collected in groups of 5–10 segments and submitted for evaluation by another model (e.g., `gpt-4o` or `gpt-4o-mini`). + +**Frequency:** +Guardrails should be applied continuously during the response, evaluating the output in near real-time as it is streamed back to the client. + +### Input Guardrails (aka Jailbreak Guardrails) +**Definition:** Checks focused on user intent and behavior over multiple conversational turns. These guardrails aim to detect whether a user is attempting to exploit or jailbreak the agent. + +**Application:** +These guardrails aggregate data across 3–4 turns (e.g., `conversation.item.input_audio_transcription.completed`) to assess whether the user's behavior indicates abuse. This can help distinguish between accidental provocations and malicious attempts. When abuse is detected, escalation or termination may be appropriate. + +**Frequency:** +Evaluate every 3–4 turns to observe broader behavioral trends. Submit accumulated user input transcripts to a model (e.g., `gpt-4o` or `gpt-4o-mini`) for review. + +## Implemenation details +For those engineering realtime guardrails, we have also provided ladder diagrams and events to look for. + +### Output Guardrails: +![Output Guardrails Ladder Diagram](../images/Realtime_output_guardrails.png) + +### Input Guardrails: +![Input Guardrails Ladder Diagram](../images/Realtime_input_guardrails_post_response.png) \ No newline at end of file diff --git a/authors.yaml b/authors.yaml index 0978111801..5422498c0c 100644 --- a/authors.yaml +++ b/authors.yaml @@ -267,3 +267,8 @@ erikakettleson-openai: name: "Erika Kettleson" website: "https://www.linkedin.com/in/erika-kettleson-85763196/" avatar: "https://avatars.githubusercontent.com/u/186107044?v=4" + +phundal-openai: + name: "Patrick Hundal" + website: "https://www.linkedin.com/in/phundal/" + avatar: https://avatars.githubusercontent.com/u/189161955?v=4" diff --git a/images/Realtime_highlevel_guardrails.png b/images/Realtime_highlevel_guardrails.png new file mode 100644 index 0000000000..34906680af Binary files /dev/null and b/images/Realtime_highlevel_guardrails.png differ diff --git a/images/Realtime_input_guardrails_post_response.png b/images/Realtime_input_guardrails_post_response.png new file mode 100644 index 0000000000..cda2c9631e Binary files /dev/null and b/images/Realtime_input_guardrails_post_response.png differ diff --git a/images/Realtime_output_guardrails.png b/images/Realtime_output_guardrails.png new file mode 100644 index 0000000000..2d46855097 Binary files /dev/null and b/images/Realtime_output_guardrails.png differ diff --git a/registry.yaml b/registry.yaml index d33fa06818..48f5abdd9d 100644 --- a/registry.yaml +++ b/registry.yaml @@ -1868,3 +1868,15 @@ - audio - speech +- title: Recommended approach to guardrails in Realtime API + path: articles/How_to_guardrail_in_Realtime_API.md + date: 2025-04-02 + authors: + - phundal-openai + tags: + - guardrails + - realtime-api + - speech-to-speech + - s2s + - audio + - speech