Skip to content

Commit 41c0cab

Browse files
authored
Merge branch 'main' into stefnestor-patch-2
2 parents 7248099 + 9d45193 commit 41c0cab

File tree

5 files changed

+75
-2
lines changed

5 files changed

+75
-2
lines changed

explore-analyze/machine-learning/nlp/ml-nlp-built-in-models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ products:
1313
There are {{nlp}} models that are available for use in every cluster out-of-the-box. These models are pre-trained which means they don’t require fine-tuning on your own data, making it adaptable for various use cases out of the box. The following models are available:
1414

1515
* [ELSER](ml-nlp-elser.md) trained by Elastic
16+
* [Jina models](ml-nlp-jina.md)
1617
* [](ml-nlp-rerank.md)
1718
* [E5](ml-nlp-e5.md)
1819
* [{{lang-ident-cap}}](ml-nlp-lang-ident.md)
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
navigation_title: Jina
3+
applies_to:
4+
stack: preview 9.3
5+
serverless: preview
6+
products:
7+
- id: machine-learning
8+
---
9+
10+
# Jina models [ml-nlp-jina]
11+
12+
This page collects all Jina models you can use as part of the {{stack}}.
13+
Currently, the following models are available as built-in models:
14+
15+
* [`jina-embeddings-v3`](#jina-embeddings-v3)
16+
17+
## `jina-embeddings-v3` [jina-embeddings-v3]
18+
19+
The [`jina-embeddings-v3`](https://jina.ai/models/jina-embeddings-v3/) is a multilingual dense vector embedding model that you can use through the [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md).
20+
It provides long-context embeddings across a wide range of languages without requiring you to configure, download, or deploy any model artifacts yourself.
21+
As the model runs on EIS, Elastic's own infrastructure, no ML node scaling and configuration is required to use it.
22+
23+
The `jina-embedings-v3` model supports input lengths of up to 8192 tokens and produces 1024-dimension embeddings by default. It uses task-specific adapters to optimize embeddings for different use cases (such as retrieval or classification), and includes support for Matryoshka Representation Learning, which allows you to truncate embeddings to fewer dimensions with minimal loss in quality.
24+
25+
### Dense vector embeddings
26+
27+
Dense vector embeddings are fixed-length numerical representations of text. When you send text to an EIS {{infer}} endpoint that uses `jina-embeddings-v3`, the model returns a vector of floating-point numbers (for example, 1024 values). Texts that are semantically similar have embeddings that are close to each other in this vector space. {{es}} stores these vectors in [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) fields or through the [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) type and uses vector similarity search to retrieve the most relevant documents for a given query. Unlike [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md), which expands text into sparse token-weight vectors, this model produces compact dense vectors that are well suited for multilingual and cross-domain use cases.
28+
29+
### Requirements [jina-embeddings-v3-req]
30+
31+
To use `jina-embeddings-v3`, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level or the trial period activated.
32+
33+
### Getting started with `jina-embeddings-v3` via the Elastic {{infer-cap}} Service
34+
35+
Create an {{infer}} endpoint that references the `jina-embeddings-v3` model in the `model_id` field.
36+
37+
```console
38+
PUT _inference/text_embedding/eis-jina-embeddings-v3
39+
{
40+
"service": "elastic",
41+
"service_settings": {
42+
"model_id": "jina-embeddings-v3"
43+
}
44+
}
45+
```
46+
47+
The created {{infer}} endpoint uses the model for {{infer}} operations on the Elastic {{infer-cap}} Service. You can reference the `inference_id` of the endpoint in text_embedding {{infer}} tasks or search queries.
48+
For example, the following API request ingests the input text and produce embeddings.
49+
50+
```console
51+
POST _inference/text_embedding/eis-jina-embeddings-v3
52+
{
53+
"input": "The sky above the port was the color of television tuned to a dead channel.",
54+
"input_type": "ingest"
55+
}
56+
```
57+
58+
### Performance considerations [jina-embeddings-v3-performance]
59+
60+
* `jina-embeddings-v3` works best on small, medium or large sized fields that contain natural language.
61+
For connector or web crawler use cases, this aligns best with fields like title, description, summary, or abstract.
62+
Although `jina-embeddings-v3` has a context window of 8192 tokens, it's best to limit the input to 2048-4096 tokens for optimal performance.
63+
For larger fields that exceed this limit - for example, `body_content` on web crawler documents - consider chunking the content into multiple values, where each chunk can be under 4096 tokens.
64+
* Larger documents take longer at ingestion time, and {{infer}} time per document also increases the more fields in a document that need to be processed.
65+
* The more fields your pipeline has to perform {{infer}} on, the longer it takes per document to ingest.

explore-analyze/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ toc:
123123
- file: machine-learning/nlp/ml-nlp-built-in-models.md
124124
children:
125125
- file: machine-learning/nlp/ml-nlp-elser.md
126+
- file: machine-learning/nlp/ml-nlp-jina.md
126127
- file: machine-learning/nlp/ml-nlp-rerank.md
127128
- file: machine-learning/nlp/ml-nlp-e5.md
128129
- file: machine-learning/nlp/ml-nlp-lang-ident.md

solutions/security/get-started/_snippets/agentless-integrations-faq.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
Frequently asked questions and troubleshooting steps for {{elastic-sec}}'s agentless CSPM integration.
22

3-
43
## When I make a new integration, when will I see the agent appear on the Integration Policies page? [_when_i_make_a_new_integration_when_will_i_see_the_agent_appear_on_the_integration_policies_page]
54

65
After you create a new agentless integration, the new integration policy may show a button that says **Add agent** instead of the associated agent for several minutes during agent enrollment. No action is needed other than refreshing the page once enrollment is complete.
@@ -76,4 +75,5 @@ When you create a new agentless CSPM integration, a new agent policy appears wit
7675

7776
## Can agentless integrations use a specific range of static IP addresses for configuring allow and deny rules for traffic?
7877

79-
No, agentless integrations can not use a specific range of static IP addresses for configuring ingress and egress allow and deny rules.
78+
No, agentless integrations can not use a specific range of static IP addresses for configuring ingress and egress allow and deny rules.
79+

solutions/security/get-started/agentless-integrations.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ During technical preview, there are no additional costs associated with deployin
2020
There is a limit of 5 agentless integrations per project.
2121
::::
2222

23+
## Requirements
24+
25+
* Agentless integrations are supported only on {{ech}}, {{sec-serverless}}, and {{obs-serverless}} deployments.
26+
* On {{ech}}, agentless integrations require a working [{{fleet-server}}](/reference/fleet/fleet-server.md).
27+
* To set up a new agentless integration, you need the `Actions and connectors: all` [{{kib}} privilege](/deploy-manage/users-roles/cluster-or-deployment-auth/kibana-privileges.md).
28+
2329
## Generally available (GA) agentless integrations
2430

2531
Elastic fully supports agentless deployment for the Cloud Security Posture Management (CSPM) integration. Using this integration’s agentless deployment option, you can enable Elastic’s CSPM capabilities just by providing the necessary credentials. Agentless CSPM deployments support AWS, Azure, and GCP accounts.

0 commit comments

Comments
 (0)