Skip to content

Commit 7faa7e1

Browse files
Milvus-doc-botMilvus-doc-bot
authored andcommitted
Release new docs to master
1 parent 4f56440 commit 7faa7e1

File tree

14 files changed

+1377
-171
lines changed

14 files changed

+1377
-171
lines changed
-859 Bytes
Loading

v2.6.x/assets/exp-decay.png

-27.3 KB
Loading

v2.6.x/assets/gaussian-decay.png

22.1 KB
Loading

v2.6.x/site/en/userGuide/embeddings-reranking/embedding-function/embedding-function-overview.md

Lines changed: 173 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,14 @@ To use an embedding function, create a collection with a specific schema. This s
194194

195195
The following example defines a schema with one scalar field `"document"` for storing textual data and one vector field `"dense"` for storing embeddings to be generated by the Function module. Remember to set the vector dimension (`dim`) to match the output of your chosen embedding model.
196196

197+
<div class="multipleCode">
198+
<a href="#python">Python</a>
199+
<a href="#java">Java</a>
200+
<a href="#javascript">NodeJS</a>
201+
<a href="#go">Go</a>
202+
<a href="#bash">cURL</a>
203+
</div>
204+
197205
```python
198206
from pymilvus import MilvusClient, DataType, Function, FunctionType
199207
@@ -212,18 +220,42 @@ schema.add_field("id", DataType.INT64, is_primary=True, auto_id=False)
212220
schema.add_field("document", DataType.VARCHAR, max_length=9000)
213221
214222
# Add vector field "dense" for storing embeddings.
215-
# IMPORTANT: Set `dim` to match the exact output dimension of the embedding model.
223+
# IMPORTANT: Set dim to match the exact output dimension of the embedding model.
216224
# For instance, OpenAI's text-embedding-3-small model outputs 1536-dimensional vectors.
217225
# For dense vector, data type can be FLOAT_VECTOR or INT8_VECTOR
218226
schema.add_field("dense", DataType.FLOAT_VECTOR, dim=1536)
219227
```
220228

229+
```java
230+
// java
231+
```
232+
233+
```javascript
234+
// nodejs
235+
```
236+
237+
```go
238+
// go
239+
```
240+
241+
```bash
242+
# restful
243+
```
244+
221245
### Step 2: Add embedding function to schema
222246

223247
The Function module in Milvus automatically converts raw data stored in a scalar field into embeddings and stores them into the explicitly defined vector field.
224248

225249
The example below adds a Function module (`openai_embedding`) that converts the scalar field `"document"` into embeddings, storing the resulting vectors in the `"dense"` vector field defined earlier.
226250

251+
<div class="multipleCode">
252+
<a href="#python">Python</a>
253+
<a href="#java">Java</a>
254+
<a href="#javascript">NodeJS</a>
255+
<a href="#go">Go</a>
256+
<a href="#bash">cURL</a>
257+
</div>
258+
227259
```python
228260
# Define embedding function (example: OpenAI provider)
229261
text_embedding_function = Function(
@@ -245,6 +277,22 @@ text_embedding_function = Function(
245277
schema.add_function(text_embedding_function)
246278
```
247279

280+
```java
281+
// java
282+
```
283+
284+
```javascript
285+
// nodejs
286+
```
287+
288+
```go
289+
// go
290+
```
291+
292+
```bash
293+
# restful
294+
```
295+
248296
<table>
249297
<tr>
250298
<th><p>Parameter</p></th>
@@ -258,7 +306,7 @@ schema.add_function(text_embedding_function)
258306
</tr>
259307
<tr>
260308
<td><p><code>function_type</code></p></td>
261-
<td><p>Type of function used. For text embedding, set the value to <code>FunctionType.TEXTEMBEDDING</code>.<br><strong>Note:</strong> Milvus accepts <code>FunctionType.BM25</code> (for sparse-embedding transformation) and <code>FunctionType.RERANK</code> (for reranking) for this parameter. Refer to <a href="full-text-search.md">Full Text Search</a> and <a href="decay-ranker-overview.md">Decay Ranker Overview</a> for details.</p></td>
309+
<td><p>Type of function used. For text embedding, set the value to <code>FunctionType.TEXTEMBEDDING</code>.</p><p><strong>Note</strong>: Milvus accepts <code>FunctionType.BM25</code> (for sparse-embedding transformation) and <code>FunctionType.RERANK</code> (for reranking) for this parameter. Refer to <a href="full-text-search.md">Full Text Search</a> and <a href="decay-ranker-overview.md">Decay Ranker Overview</a> for details.</p></td>
262310
<td><p><code>FunctionType.TEXTEMBEDDING</code></p></td>
263311
</tr>
264312
<tr>
@@ -288,18 +336,12 @@ schema.add_function(text_embedding_function)
288336
</tr>
289337
<tr>
290338
<td><p><code>credential</code></p></td>
291-
<td><p>The label of a credential defined in the top-level <code>credential:</code> section of <code>milvus.yaml</code>. </p>
292-
<ul>
293-
<li><p>When provided, Milvus retrieves the matching key pair or API token and signs the request on the server side.</p></li>
294-
<li><p>When omitted (<code>None</code>), Milvus falls back to the credential explicitly configured for the target model provider in <code>milvus.yaml</code>.</p></li>
295-
<li><p>If the label is unknown or the referenced key is missing, the call fails.</p></li>
296-
</ul></td>
339+
<td><p>The label of a credential defined in the top-level <code>credential:</code> section of <code>milvus.yaml</code>. </p><ul><li><p>When provided, Milvus retrieves the matching key pair or API token and signs the request on the server side.</p></li><li><p>When omitted (<code>None</code>), Milvus falls back to the credential explicitly configured for the target model provider in <code>milvus.yaml</code>.</p></li><li><p>If the label is unknown or the referenced key is missing, the call fails.</p></li></ul></td>
297340
<td><p><code>"apikey1"</code></p></td>
298341
</tr>
299342
<tr>
300343
<td><p><code>dim</code></p></td>
301-
<td><p>The number of dimensions for the output embeddings. For OpenAI's third-generation models, you can shorten the full vector to reduce cost and latency without a significant loss of semantic information. For more information, refer to <a href="https://openai.com/blog/new-embedding-models-and-api-updates">OpenAI announcement blog post</a>.<br>
302-
<strong>Note:</strong> If you shorten the vector dimension, ensure the <code>dim</code> value specified in the schema's <code>add_field</code> method for the vector field matches the final output dimension of your embedding function.</p></td>
344+
<td><p>The number of dimensions for the output embeddings. For OpenAI's third-generation models, you can shorten the full vector to reduce cost and latency without a significant loss of semantic information. For more information, refer to <a href="https://openai.com/blog/new-embedding-models-and-api-updates">OpenAI announcement blog post</a>.</p><p><strong>Note:</strong> If you shorten the vector dimension, ensure the <code>dim</code> value specified in the schema's <code>add_field</code> method for the vector field matches the final output dimension of your embedding function.</p></td>
303345
<td><p><code>"1536"</code></p></td>
304346
</tr>
305347
<tr>
@@ -319,6 +361,14 @@ For collections with multiple scalar fields requiring text-to-vector conversion,
319361

320362
After defining the schema with necessary fields and the built-in function, set up the index for your collection. To simplify this process, use `AUTOINDEX` as the `index_type`, an option that allows Milvus to choose and configure the most suitable index type based on the structure of your data.
321363

364+
<div class="multipleCode">
365+
<a href="#python">Python</a>
366+
<a href="#java">Java</a>
367+
<a href="#javascript">NodeJS</a>
368+
<a href="#go">Go</a>
369+
<a href="#bash">cURL</a>
370+
</div>
371+
322372
```python
323373
# Prepare index parameters
324374
index_params = client.prepare_index_params()
@@ -331,10 +381,34 @@ index_params.add_index(
331381
)
332382
```
333383

384+
```java
385+
// java
386+
```
387+
388+
```javascript
389+
// nodejs
390+
```
391+
392+
```go
393+
// go
394+
```
395+
396+
```bash
397+
# restful
398+
```
399+
334400
### Step 4: Create collection
335401

336402
Now create the collection using the schema and index parameters defined.
337403

404+
<div class="multipleCode">
405+
<a href="#python">Python</a>
406+
<a href="#java">Java</a>
407+
<a href="#javascript">NodeJS</a>
408+
<a href="#go">Go</a>
409+
<a href="#bash">cURL</a>
410+
</div>
411+
338412
```python
339413
# Create collection named "demo"
340414
client.create_collection(
@@ -344,10 +418,34 @@ client.create_collection(
344418
)
345419
```
346420

421+
```java
422+
// java
423+
```
424+
425+
```javascript
426+
// nodejs
427+
```
428+
429+
```go
430+
// go
431+
```
432+
433+
```bash
434+
# restful
435+
```
436+
347437
### Step 5: Insert data
348438

349439
After setting up your collection and index, you're ready to insert your raw data. In this process, you need only to provide the raw text. The Function module we defined earlier automatically generates the corresponding sparse vector for each text entry.
350440

441+
<div class="multipleCode">
442+
<a href="#python">Python</a>
443+
<a href="#java">Java</a>
444+
<a href="#javascript">NodeJS</a>
445+
<a href="#go">Go</a>
446+
<a href="#bash">cURL</a>
447+
</div>
448+
351449
```python
352450
# Insert sample documents
353451
client.insert('demo', [
@@ -357,10 +455,34 @@ client.insert('demo', [
357455
])
358456
```
359457

458+
```java
459+
// java
460+
```
461+
462+
```javascript
463+
// nodejs
464+
```
465+
466+
```go
467+
// go
468+
```
469+
470+
```bash
471+
# restful
472+
```
473+
360474
### Step 6: Perform vector search
361475

362476
After data insertion, perform a semantic search using raw query text. Milvus automatically converts your query into an embedding vector, retrieves relevant documents based on similarity, and returns the top-matching results.
363477

478+
<div class="multipleCode">
479+
<a href="#python">Python</a>
480+
<a href="#java">Java</a>
481+
<a href="#javascript">NodeJS</a>
482+
<a href="#go">Go</a>
483+
<a href="#bash">cURL</a>
484+
</div>
485+
364486
```python
365487
# Perform semantic search
366488
results = client.search(
@@ -377,6 +499,22 @@ print(results)
377499
# data: ["[{'id': 1, 'distance': 0.8821347951889038, 'entity': {'document': 'Milvus simplifies semantic search through embeddings.'}}]"]
378500
```
379501

502+
```java
503+
// java
504+
```
505+
506+
```javascript
507+
// nodejs
508+
```
509+
510+
```go
511+
// go
512+
```
513+
514+
```bash
515+
# restful
516+
```
517+
380518
For more information about search and query operations, refer to [Basic Vector Search](single-vector-search.md) and [Query](get-and-scalar-query.md).
381519

382520
## FAQ
@@ -403,10 +541,18 @@ You can check by:
403541

404542
### When I perform a similarity search, can I use a query vector rather than raw text?
405543

406-
Yes, you can use pre-computed query vectors instead of raw text for similarity search. While the Function module automatically converts raw text queries to embeddings, you can also directly provide vector data to the data parameter in your search operation. Note: The dimension size of your provided query vector must be consistent with the dimension size of the vector embeddings generated by your Function module.
544+
Yes, you can use pre-computed query vectors instead of raw text for similarity search. While the Function module automatically converts raw text queries to embeddings, you can also directly provide vector data to the `data` parameter in your search operation. **Note**: The dimension size of your provided query vector must be consistent with the dimension size of the vector embeddings generated by your Function module.
407545

408546
**Example**:
409547

548+
<div class="multipleCode">
549+
<a href="#python">Python</a>
550+
<a href="#java">Java</a>
551+
<a href="#javascript">NodeJS</a>
552+
<a href="#go">Go</a>
553+
<a href="#bash">cURL</a>
554+
</div>
555+
410556
```python
411557
# Using raw text (Function module converts automatically)
412558
results = client.search(
@@ -424,4 +570,20 @@ results = client.search(
424570
anns_field='dense',
425571
limit=1
426572
)
573+
```
574+
575+
```java
576+
// java
577+
```
578+
579+
```javascript
580+
// nodejs
581+
```
582+
583+
```go
584+
// go
585+
```
586+
587+
```bash
588+
# restful
427589
```

0 commit comments

Comments
 (0)