Customize LLM Parameters at Runtime

The RAG server exposes an OpenAI-compatible API, using which developers can customize different LLM parameters at runtime. For full details, see APIs for RAG Server.

Use the /generate endpoint in the RAG server of a RAG pipeline to generate responses to prompts.

To configure the behavior of the LLM dynamically at runtime, you can include or change the following parameters in the request body while trying out the generate API using the notebook.

Parameter	Description	Type	Valid Values	Default	Optional?
max_tokens	The maximum number of tokens to generate during inference. This limits the length of the generated text.	Integer	—	1024	Yes
stop	A list of strings to use as stop tokens in the text generation. The text returned does not include the stop tokens.	Array	—	[]	Yes
temperature	Adjusts the randomness of token selection. Higher values increase randomness and creativity; lower values promote deterministic and conservative output.	Number	0.1 - 1.0	0.2	Yes
top_p	A threshold that selects from the most probable tokens until the cumulative probability exceeds p.	Number	0.1 - 1.0	0.7	Yes

Example payload for customization

You can include or change the following parameters in the request body while trying out the generate API using this notebook.

max_tokens=150 — limits response length to 150 tokens
stop=["\n"] — generation stops at the newline character
temperature=0.3 — moderate randomness
top_p=0.8 — considers tokens with cumulative probability up to 0.8

{
   "messages": [
      {
         "role": "user",
         "content": "Explain the key features of FastAPI."
      }
   ],
   "max_tokens": 150,
   "stop": ["\n"],
   "temperature": 0.3,
   "top_p": 0.8,
   "use_knowledge_base": true
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-params.md

llm-params.md

Customize LLM Parameters at Runtime

Example payload for customization

Files

llm-params.md

Latest commit

History

llm-params.md

File metadata and controls

Customize LLM Parameters at Runtime

Example payload for customization