You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Improved support for NIMs, now in GA! Sign up [here](https://catalog.ngc.nvidia.com/orgs/nim/teams/meta/containers/llama3-8b-instruct/tags) for access.
- Replaced model name field with full container image/tag field for ease of use (eg. copy-paste).
- Improved local NIM switchability by replacing the model field with the full NIM container path. Users can copy and paste their NIM container directly in the chat UI.
- Improved Local NIM flow to replace model-repo-generate step with a NIM sidecar container pull step to better align with new NIM release.
- Fixed an issue with Remote NIM support returning null token for vLLM-backend NIMs
- Set defaults for the project settings to better align with the quickstart contents in the NIM documentation (now uses vLLM backend)
2. Improved Metrics Tracking
- Removed "clear query" button to accommodate for Show Metrics panel functionality.
- Added support for new metrics:
- retrieval time (ms)
- TTFT (ms)
- generation time (ms)
- E2E (ms)
- approx. tokens in response
- approx. tokens generated per second
- approx. inter-token latency (ITL)
3. Expanded Cloud supported models (12 -> 18)
- Added support for IBM's Granite Code models to better align with NVIDIA's API Catalog
- Granite 8B Code Instruct
- Granite 34B Code Instruct
- Widened support for Microsoft's Phi-3 models to better align with NVIDIA's API Catalog
- Phi-3 Mini (4k)
- Phi-3 Small (8k)
- Phi-3 Small (128k)
- Phi-3 Medium (4k)
- Implemented temporary workaround to fix an issue with Microsoft's Phi-3 model not supporting penalty parameters.
4. Expanded local model selection for locally-running RAG
- Added ungated model for local HG TGI: microsoft/Phi-3-mini-128k-instruct
- Add filtering option to filter local models dropdown by gated vs ungated models
5. Additional Output Customization
- Added support for new Output Settings parameters:
- top_p
- frequency penalty
- presence penalty
- Increase max new tokens to generate to up to 2048 max tokens to generate (from 512)
- Dynamic max new tokens to generate limits set depending on auto system introspection
6. General Usability
- Improved UI clutter by turning some major UI components collapsible.
- Right hand inference settings panel can collapse and expand to reduce clutter
- Output parameters sliders now hidden by default to reduce clutter, but can be expanded
- Improved error messaging and forwarding of issues to the frontend UI.
- Increase timeouts to capture a broader range of user setups
- Ongoing improvements in documentation of code.
"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, dangerous, or illegal content. If you don't know the answer to a question, please don't share false information. Please ensure that your responses are positive in nature.\n"
104
+
"The user's question is: {context_str} {query_str} <|end|> \n"
105
+
"<|assistant|>"
106
+
)
107
+
108
+
GENERIC_CHAT_TEMPLATE= (
109
+
"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, dangerous, or illegal content. If you don't know the answer to a question, please don't share false information. Please ensure that your responses are positive in nature.\n"
110
+
"The user's question is: {context_str} {query_str} <|end|> \n"
111
+
)
112
+
101
113
MISTRAL_RAG_TEMPLATE= (
102
114
"<s>[INST] <<SYS>>"
103
115
"Use the following context to answer the user's question. If you don't know the answer,"
@@ -132,6 +144,22 @@
132
144
"Assistant: "
133
145
)
134
146
147
+
MICROSOFT_RAG_TEMPLATE= (
148
+
"<|user|>\n"
149
+
"Use the following context to answer the question. If you don't know the answer,"
150
+
"just say that you don't know, don't try to make up an answer.\n"
151
+
"Context: {context_str} Question: {query_str} Only return the helpful"
152
+
" answer below and nothing else. <|end|> \n"
153
+
"<|assistant|>"
154
+
)
155
+
156
+
GENERIC_RAG_TEMPLATE= (
157
+
"Use the following context to answer the question. If you don't know the answer,"
158
+
"just say that you don't know, don't try to make up an answer.\n"
159
+
"Context: {context_str} Question: {query_str} Only return the helpful"
num_nodes=1if ((inference_mode=="cloud"andnvcf_model_id=="playground_llama2_13b") or (inference_mode=="cloud"andnvcf_model_id=="playground_llama2_70b")) else2
0 commit comments