Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blog fix] OpenSearch now supports DeepSeek chat models #3617

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 24 additions & 184 deletions _posts/2025-01-28-OpenSearch-Now-Supports-DeepSeek-Chat-Models.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,197 +15,29 @@ meta_description: Explore how OpenSearch's integration with DeepSeek-R1 LLM mode

We're excited to announce that OpenSearch now supports DeepSeek integration, providing powerful and cost-effective AI capabilities. DeepSeek-R1 is a recently released open-source large language model (LLM) that delivers **similar benchmarking performance** to leading LLMs like OpenAI O1 ([report](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)) at a significantly **lower cost** ([DeepSeek API pricing](https://api-docs.deepseek.com/quick_start/pricing)). Because DeepSeek-R1 is open source, you can download and deploy it to your preferred infrastructure. This enables you to build more cost-effective and sustainable retrieval-augmented generation (RAG) solutions in OpenSearch's vector database.

OpenSearch gives you the flexibility to connect to any remote inference service, such as DeepSeek or OpenAI, using machine learning (ML) connectors. You can use [prebuilt connector blueprints](https://github.com/opensearch-project/ml-commons/tree/main/docs/remote_inference_blueprints) or customize connectors based on your requirements. For more information about connector blueprints, see [Blueprints](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/).
OpenSearch gives you the flexibility to connect to any inference service, such as DeepSeek or OpenAI, using machine learning (ML) connectors. You can use [prebuilt connector blueprints](https://github.com/opensearch-project/ml-commons/tree/main/docs/remote_inference_blueprints) or customize connectors based on your requirements. For more information about connector blueprints, see [Blueprints](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/).

We've added a new [connector blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/deepseek_connector_chat_blueprint.md) for the DeepSeek-R1 model. This integration, combined with OpenSearch's built-in vector database capabilities, makes it easier and more cost effective to build [RAG applications](https://opensearch.org/docs/latest/search-plugins/conversational-search) in OpenSearch.

The following example shows you how to implement RAG with DeepSeek in OpenSearch's vector database. This example guides you through creating a connector for the [DeepSeek chat model](https://api-docs.deepseek.com/api/create-chat-completion) and setting up a [RAG pipeline](https://opensearch.org/docs/latest/search-plugins/search-pipelines/rag-processor/) in OpenSearch.

### 1. Create a connector for DeepSeek
### Setup

First, create a connector for the DeepSeek chat model, providing your own DeepSeek API key:
For a simplified setup, you can follow this [blog post](https://opensearch.org/blog/one-click-deepseek-integration/), which allows you to create a connector for the DeepSeek model, create a model group, register the model, and create a search pipeline with a single API call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog post is referring this blog for being manual and complex to set up. However, we’re directing readers back to that same blog for setup instructions in this blog, which doesn’t seem logical to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. @dylan-tong-aws Do we want to update the blog post mentioned to modify this part since we're changing the setup in this blog or do we want to keep the original setup in this blog post and give a link to that blog post for a simpler setup?

cc: @minalsha


```json
POST /_plugins/_ml/connectors/_create
{
"name": "DeepSeek Chat",
"description": "Test connector for DeepSeek Chat",
"version": "1",
"protocol": "http",
"parameters": {
"endpoint": "api.deepseek.com",
"model": "deepseek-chat"
},
"credential": {
"deepSeek_key": "<PLEASE ADD YOUR DEEPSEEK API KEY HERE>"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://${parameters.endpoint}/v1/chat/completions",
"headers": {
"Content-Type": "application/json",
"Authorization": "Bearer ${credential.deepSeek_key}"
},
"request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
}
]
}
```

The response contains a connector ID for the newly created connector:

```json
{
"connector_id": "n0dOqZQBQwAL8-GO1pYI"
}
```

For more information, see [Connecting to externally hosted models](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/index/).

**Note**: Because DeepSeek-R1 is open source, you can host it on AWS (see [DeepSeek-R1 models now available on AWS](http://aws.amazon.com/blogs/aws/deepseek-r1-models-now-available-on-aws)). To connect to your hosted model, update the `endpoint` and `credentials` parameters in your configuration.

### 2. Create a model group

Create a model group for the DeepSeek chat model:
After completing the setup, follow these steps:

### 1. Create a vector database
Follow the [neural search tutorial](https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/) to create an embedding model and a k-NN index. Then ingest data into the index:
```json
POST /_plugins/_ml/model_groups/_register
{
"name": "remote_model_group_chat",
"description": "This is an example description"
}
POST _bulk
{"index": {"_index": "my_rag_test_data", "_id": "1"}}
{"text": "Abraham Lincoln was born on February 12, 1809, the second child of Thomas Lincoln and Nancy Hanks Lincoln, in a log cabin on Sinking Spring Farm near Hodgenville, Kentucky.[2] He was a descendant of Samuel Lincoln, an Englishman who migrated from Hingham, Norfolk, to its namesake, Hingham, Massachusetts, in 1638. The family then migrated west, passing through New Jersey, Pennsylvania, and Virginia.[3] Lincoln was also a descendant of the Harrison family of Virginia; his paternal grandfather and namesake, Captain Abraham Lincoln and wife Bathsheba (née Herring) moved the family from Virginia to Jefferson County, Kentucky.[b] The captain was killed in an Indian raid in 1786.[5] His children, including eight-year-old Thomas, Abraham's father, witnessed the attack.[6][c] Thomas then worked at odd jobs in Kentucky and Tennessee before the family settled in Hardin County, Kentucky, in the early 1800s."}
{"index": {"_index": "my_rag_test_data", "_id": "2"}}
{"text": "Chart and table of population level and growth rate for the New York City metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of New York City in 2023 is 18,937,000, a 0.37% increase from 2022.\\nThe metro area population of New York City in 2022 was 18,867,000, a 0.23% increase from 2021.\\nThe metro area population of New York City in 2021 was 18,823,000, a 0.1% increase from 2020.\\nThe metro area population of New York City in 2020 was 18,804,000, a 0.01% decline from 2019."}
```

The response contains a model group ID:

```json
{
"model_group_id": "b0cjqZQBQwAL8-GOVJZ4",
"status": "CREATED"
}
```

For more information about model groups, see [Model access control](https://opensearch.org/docs/latest/ml-commons-plugin/model-access-control/).

### 3. Register and deploy the model

Register the model to the model group and deploy the model using the model group ID and connector ID created in the previous steps:

```json
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "DeepSeek Chat model",
"function_name": "remote",
"model_group_id": "b0cjqZQBQwAL8-GOVJZ4",
"description": "DeepSeek Chat",
"connector_id": "n0dOqZQBQwAL8-GO1pYI"
}
```

The response contains the model ID:

```json
{
"task_id": "oEdPqZQBQwAL8-GOCJbw",
"status": "CREATED",
"model_id": "oUdPqZQBQwAL8-GOCZYL"
}
```

To ensure that the connector is working as expected, test the model:

```json
POST /_plugins/_ml/models/oUdPqZQBQwAL8-GOCZYL/_predict
{
"parameters": {
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}
}
```

The response verifies that the connector is working as expected:

```json
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"id": "9d9bd689-88a5-44b0-b73f-2daa92518761",
"object": "chat.completion",
"created": 1.738011126E9,
"model": "deepseek-chat",
"choices": [
{
"index": 0.0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11.0,
"completion_tokens": 11.0,
"total_tokens": 22.0,
"prompt_tokens_details": {
"cached_tokens": 0.0
},
"prompt_cache_hit_tokens": 0.0,
"prompt_cache_miss_tokens": 11.0
},
"system_fingerprint": "fp_3a5770e1b4"
}
}
],
"status_code": 200
}
]
}
```

### 4. Create a search pipeline

Create a search pipeline with a `retrieval_augmented_generation` processor:

```json
PUT /_search/pipeline/rag_pipeline
{
"response_processors": [
{
"retrieval_augmented_generation": {
"tag": "deepseek_pipeline_demo",
"description": "Demo pipeline Using DeepSeek Connector",
"model_id": "oUdPqZQBQwAL8-GOCZYL",
"context_field_list": ["text"],
"system_prompt": "You are a helpful assistant",
"user_instructions": "Generate a concise and informative answer in less than 100 words for the given question"
}
}
]
}
```

For more information, see [Conversational search](https://opensearch.org/docs/latest/search-plugins/conversational-search).

### 5. Create a conversation memory

Assuming that you created a k-NN index and ingested the data to use vector search, you can now create a conversation memory. For more information about creating a k-NN index, see [k-NN index](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/). For more information about vector search, see [Vector search](https://opensearch.org/docs/latest/search-plugins/vector-search/). For more information about ingesting data, see [Ingest RAG data into an index](https://opensearch.org/docs/latest/search-plugins/conversational-search/#step-4-ingest-rag-data-into-an-index).

### 2. Create a conversation memory
Create a conversation memory to store all messages from a conversation:

```json
Expand All @@ -223,22 +55,30 @@ The response contains a memory ID for the created memory:
}
```

### 6. Use the pipeline for RAG
### 3. Use the pipeline for RAG

Send a query to OpenSearch and provide additional parameters in the `ext.generative_qa_parameters` object:

```json
GET /my_rag_test_data/_search
{
"query": {
"match": {
"text": "What's the population of NYC metro area in 2023"
"neural": {
"passage_embedding": {
"query_text": "What's the population of NYC metro area in 2023?",
"model_id": "USkHsZQBts7fa6bybx3G",
"k": 5
}
}
},
"size": 2,
"_source": [
"text"
],
"ext": {
"generative_qa_parameters": {
"llm_model": "deepseek-chat",
"llm_question": "What's the population of NYC metro area in 2023",
"llm_question": "What's the population of NYC metro area in 2023?",
"memory_id": "znCqcI0BfUsSoeNTntd7",
"context_size": 5,
"message_size": 5,
Expand Down