You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -194,6 +194,14 @@ To use an embedding function, create a collection with a specific schema. This s
194
194
195
195
The following example defines a schema with one scalar field `"document"` for storing textual data and one vector field `"dense"` for storing embeddings to be generated by the Function module. Remember to set the vector dimension (`dim`) to match the output of your chosen embedding model.
196
196
197
+
<div class="multipleCode">
198
+
<a href="#python">Python</a>
199
+
<a href="#java">Java</a>
200
+
<a href="#javascript">NodeJS</a>
201
+
<a href="#go">Go</a>
202
+
<a href="#bash">cURL</a>
203
+
</div>
204
+
197
205
```python
198
206
from pymilvus import MilvusClient, DataType, Function, FunctionType
The Function module in Milvus automatically converts raw data stored in a scalar field into embeddings and stores them into the explicitly defined vector field.
224
248
225
249
The example below adds a Function module (`openai_embedding`) that converts the scalar field `"document"` into embeddings, storing the resulting vectors in the `"dense"` vector field defined earlier.
226
250
251
+
<div class="multipleCode">
252
+
<a href="#python">Python</a>
253
+
<a href="#java">Java</a>
254
+
<a href="#javascript">NodeJS</a>
255
+
<a href="#go">Go</a>
256
+
<a href="#bash">cURL</a>
257
+
</div>
258
+
227
259
```python
228
260
# Define embedding function (example: OpenAI provider)
<td><p>Type of function used. For text embedding, set the value to <code>FunctionType.TEXTEMBEDDING</code>.<br><strong>Note:</strong> Milvus accepts <code>FunctionType.BM25</code> (for sparse-embedding transformation) and <code>FunctionType.RERANK</code> (for reranking) for this parameter. Refer to <a href="full-text-search.md">Full Text Search</a> and <a href="decay-ranker-overview.md">Decay Ranker Overview</a> for details.</p></td>
309
+
<td><p>Type of function used. For text embedding, set the value to <code>FunctionType.TEXTEMBEDDING</code>.</p><p><strong>Note</strong>:Milvus accepts <code>FunctionType.BM25</code> (for sparse-embedding transformation) and <code>FunctionType.RERANK</code> (for reranking) for this parameter. Refer to <a href="full-text-search.md">Full Text Search</a> and <a href="decay-ranker-overview.md">Decay Ranker Overview</a> for details.</p></td>
<td><p>The label of a credential defined in the top-level <code>credential:</code> section of <code>milvus.yaml</code>. </p>
292
-
<ul>
293
-
<li><p>When provided, Milvus retrieves the matching key pair or API token and signs the request on the server side.</p></li>
294
-
<li><p>When omitted (<code>None</code>), Milvus falls back to the credential explicitly configured for the target model provider in <code>milvus.yaml</code>.</p></li>
295
-
<li><p>If the label is unknown or the referenced key is missing, the call fails.</p></li>
296
-
</ul></td>
339
+
<td><p>The label of a credential defined in the top-level <code>credential:</code> section of <code>milvus.yaml</code>. </p><ul><li><p>When provided, Milvus retrieves the matching key pair or API token and signs the request on the server side.</p></li><li><p>When omitted (<code>None</code>), Milvus falls back to the credential explicitly configured for the target model provider in <code>milvus.yaml</code>.</p></li><li><p>If the label is unknown or the referenced key is missing, the call fails.</p></li></ul></td>
297
340
<td><p><code>"apikey1"</code></p></td>
298
341
</tr>
299
342
<tr>
300
343
<td><p><code>dim</code></p></td>
301
-
<td><p>The number of dimensions for the output embeddings. For OpenAI's third-generation models, you can shorten the full vector to reduce cost and latency without a significant loss of semantic information. For more information, refer to <a href="https://openai.com/blog/new-embedding-models-and-api-updates">OpenAI announcement blog post</a>.<br>
302
-
<strong>Note:</strong> If you shorten the vector dimension, ensure the <code>dim</code> value specified in the schema's <code>add_field</code> method for the vector field matches the final output dimension of your embedding function.</p></td>
344
+
<td><p>The number of dimensions for the output embeddings. For OpenAI's third-generation models, you can shorten the full vector to reduce cost and latency without a significant loss of semantic information. For more information, refer to <a href="https://openai.com/blog/new-embedding-models-and-api-updates">OpenAI announcement blog post</a>.</p><p><strong>Note:</strong> If you shorten the vector dimension, ensure the <code>dim</code> value specified in the schema's <code>add_field</code> method for the vector field matches the final output dimension of your embedding function.</p></td>
303
345
<td><p><code>"1536"</code></p></td>
304
346
</tr>
305
347
<tr>
@@ -319,6 +361,14 @@ For collections with multiple scalar fields requiring text-to-vector conversion,
319
361
320
362
After defining the schema with necessary fields and the built-in function, set up the index for your collection. To simplify this process, use `AUTOINDEX` as the `index_type`, an option that allows Milvus to choose and configure the most suitable index type based on the structure of your data.
321
363
364
+
<div class="multipleCode">
365
+
<a href="#python">Python</a>
366
+
<a href="#java">Java</a>
367
+
<a href="#javascript">NodeJS</a>
368
+
<a href="#go">Go</a>
369
+
<a href="#bash">cURL</a>
370
+
</div>
371
+
322
372
```python
323
373
# Prepare index parameters
324
374
index_params = client.prepare_index_params()
@@ -331,10 +381,34 @@ index_params.add_index(
331
381
)
332
382
```
333
383
384
+
```java
385
+
// java
386
+
```
387
+
388
+
```javascript
389
+
// nodejs
390
+
```
391
+
392
+
```go
393
+
// go
394
+
```
395
+
396
+
```bash
397
+
# restful
398
+
```
399
+
334
400
### Step 4: Create collection
335
401
336
402
Now create the collection using the schema and index parameters defined.
337
403
404
+
<div class="multipleCode">
405
+
<a href="#python">Python</a>
406
+
<a href="#java">Java</a>
407
+
<a href="#javascript">NodeJS</a>
408
+
<a href="#go">Go</a>
409
+
<a href="#bash">cURL</a>
410
+
</div>
411
+
338
412
```python
339
413
# Create collection named "demo"
340
414
client.create_collection(
@@ -344,10 +418,34 @@ client.create_collection(
344
418
)
345
419
```
346
420
421
+
```java
422
+
// java
423
+
```
424
+
425
+
```javascript
426
+
// nodejs
427
+
```
428
+
429
+
```go
430
+
// go
431
+
```
432
+
433
+
```bash
434
+
# restful
435
+
```
436
+
347
437
### Step 5: Insert data
348
438
349
439
After setting up your collection and index, you're ready to insert your raw data. In this process, you need only to provide the raw text. The Function module we defined earlier automatically generates the corresponding sparse vector for each text entry.
350
440
441
+
<div class="multipleCode">
442
+
<a href="#python">Python</a>
443
+
<a href="#java">Java</a>
444
+
<a href="#javascript">NodeJS</a>
445
+
<a href="#go">Go</a>
446
+
<a href="#bash">cURL</a>
447
+
</div>
448
+
351
449
```python
352
450
# Insert sample documents
353
451
client.insert('demo', [
@@ -357,10 +455,34 @@ client.insert('demo', [
357
455
])
358
456
```
359
457
458
+
```java
459
+
// java
460
+
```
461
+
462
+
```javascript
463
+
// nodejs
464
+
```
465
+
466
+
```go
467
+
// go
468
+
```
469
+
470
+
```bash
471
+
# restful
472
+
```
473
+
360
474
### Step 6: Perform vector search
361
475
362
476
After data insertion, perform a semantic search using raw query text. Milvus automatically converts your query into an embedding vector, retrieves relevant documents based on similarity, and returns the top-matching results.
For more information about search and query operations, refer to [Basic Vector Search](single-vector-search.md) and [Query](get-and-scalar-query.md).
381
519
382
520
## FAQ
@@ -403,10 +541,18 @@ You can check by:
403
541
404
542
### When I perform a similarity search, can I use a query vector rather than raw text?
405
543
406
-
Yes, you can use pre-computed query vectors instead of raw text for similarity search. While the Function module automatically converts raw text queries to embeddings, you can also directly provide vector data to the data parameter in your search operation. Note: The dimension size of your provided query vector must be consistent with the dimension size of the vector embeddings generated by your Function module.
544
+
Yes, you can use pre-computed query vectors instead of raw text for similarity search. While the Function module automatically converts raw text queries to embeddings, you can also directly provide vector data to the `data` parameter in your search operation. **Note**: The dimension size of your provided query vector must be consistent with the dimension size of the vector embeddings generated by your Function module.
407
545
408
546
**Example**:
409
547
548
+
<div class="multipleCode">
549
+
<a href="#python">Python</a>
550
+
<a href="#java">Java</a>
551
+
<a href="#javascript">NodeJS</a>
552
+
<a href="#go">Go</a>
553
+
<a href="#bash">cURL</a>
554
+
</div>
555
+
410
556
```python
411
557
# Using raw text (Function module converts automatically)
0 commit comments