Skip to content

Commit 591c877

Browse files
authored
Restructured, improved docs on LLMs in Vespa (#3746)
Added local LLMs to sidebar Improved docs on external LLMs. Added section on structured output and custom language model to LLMs in Vespa.
1 parent f7a59e2 commit 591c877

File tree

3 files changed

+257
-96
lines changed

3 files changed

+257
-96
lines changed

_data/sidebar.yml

+8-2
Original file line numberDiff line numberDiff line change
@@ -228,8 +228,14 @@ docs:
228228
url: /en/reference/developing-server-providers.html
229229
- page: Server Tutorial
230230
url: /en/jdisc/server-tutorial.html
231-
- page: LLMs in Vespa
232-
url: /en/llms-in-vespa.html
231+
232+
- title: LLMs in Vespa
233+
url: /en/llms-in-vespa.html
234+
documents:
235+
- page: Local LLMs in Vespa
236+
url: /en/llms-local.html
237+
- page: External LLMs in Vespa
238+
url: /en/llms-external.html
233239
- page: RAG in Vespa
234240
url: /en/llms-rag.html
235241

en/llms-external.md

+97
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
# Copyright Vespa.ai. All rights reserved.
3+
title: "External LLMs in Vespa"
4+
---
5+
6+
Please refer to [Large Language Models in Vespa](llms-in-vespa.html) for an
7+
introduction to using LLMs in Vespa.
8+
9+
Vespa provides a client for integration with OpenAI compatible APIs.
10+
This includes, but is not limited to
11+
[OpenAI](https://platform.openai.com/docs/overview),
12+
[Google Gemini](https://ai.google.dev/),
13+
[Anthropic](https://www.anthropic.com/api),
14+
[Cohere](https://docs.cohere.com/docs/compatibility-api)
15+
and [Together.ai](https://docs.together.ai/docs/openai-api-compatibility).
16+
You can also host your own OpenAI-compatible server using for example
17+
[VLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#quickstart-online) or
18+
[llama-cpp-server](https://llama-cpp-python.readthedocs.io/en/latest/server/).
19+
20+
{% include note.html content='Note that this is currently a Beta feature so changes can be expected.' %}
21+
22+
### Configuring the OpenAI client
23+
24+
To set up a connection to an LLM service such as OpenAI's ChatGPT, you need to
25+
define a component in your application's
26+
[services.xml](reference/services.html):
27+
28+
```
29+
<services version="1.0">
30+
<container id="default" version="1.0">
31+
32+
...
33+
34+
<component id="openai" class="ai.vespa.llm.clients.OpenAI">
35+
36+
<!-- Optional configuration: -->
37+
<config name="ai.vespa.llm.clients.llm-client">
38+
<apiKeySecretName> ... </apiKeySecretName>
39+
<endpoint> ... </endpoint>
40+
</config>
41+
42+
</component>
43+
44+
...
45+
46+
</container>
47+
</services>
48+
```
49+
50+
To see the full list of available configuration parameters, refer to the [llm-client config definition file](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/resources/configdefinitions/llm-client.def).
51+
52+
This sets up a client component that can be used in a
53+
[searcher](glossary.html#searcher) or a [document processor](glossary.html#document-processor).
54+
55+
### API key configuration
56+
57+
Vespa provides several options to configure the API key used by the client.
58+
59+
1. Using the [Vespa Cloud secret store](https://cloud.vespa.ai/en/security/secret-store.html) to store the API key. This is done by setting the `apiKeySecretName` configuration parameter to the name of the secret in the secret store. This is the recommended way for Vespa Cloud users.
60+
2. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query.
61+
3. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs.
62+
63+
You can set up multiple connections with different settings. For instance, you
64+
might want to run different LLMs for different tasks. To distinguish between the
65+
connections, modify the `id` attribute in the component specification. We will
66+
see below how this is used to control which LLM is used for which task.
67+
68+
As a reminder, Vespa also has the option of running custom LLMs locally. Please refer to
69+
[running LLMs in your application](llms-local.html) for more information.
70+
71+
### Inference parameters
72+
73+
Please refer to the general discussion in [LLM parameters](llms-in-vespa.html#llm-parameters) for setting inference
74+
parameters.
75+
76+
The OpenAI-client also has the following inference parameters that can be sent along
77+
with the query:
78+
- model
79+
- maxTokens
80+
- temperature
81+
82+
### Connecting to other OpenAI-compatible providers
83+
84+
By default, this particular client connects to the OpenAI service, but can be used against any
85+
<a href="https://platform.openai.com/docs/guides/text-generation/chat-completions-api" data-proofer-ignore>OpenAI chat completion compatible API</a>
86+
by changing the `endpoint` configuration parameter.
87+
88+
### FAQ
89+
90+
- **Q: How do I know if my LLM is compatible with the OpenAI client?**
91+
- A: The OpenAI client is compatible with any LLM that implements the OpenAI chat completion API. You can check the documentation of your LLM provider to see if they support this API.
92+
- **Q: Can I use the [Responses](https://platform.openai.com/docs/api-reference/responses/create) provided by OpenAI**
93+
- A: No, currently only the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) is supported.
94+
- **Q: Can I use the OpenAI client for reranking?**
95+
- A: Yes, but currently, you need to implement a [custom searcher](/en/searcher-development.html) that uses the OpenAI client to rerank the results.
96+
- **Q: Can I use the OpenAI client for retrieving embeddings?**
97+
- A: No, currently, only the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) is supported.

0 commit comments

Comments
 (0)