-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog Post: Advancing Search Quality and Inference Speed with v2 Series Neural Sparse Models #3169
Blog Post: Advancing Search Quality and Inference Speed with v2 Series Neural Sparse Models #3169
Conversation
Signed-off-by: zhichao-aws <[email protected]>
The blog should be released after the update to current documentation on supported sparse models. I'll give update on that PR here |
Signed-off-by: zhichao-aws <[email protected]>
The PR link for documentation: opensearch-project/documentation-website#7987 |
@zhichao-aws The doc PR is merged. |
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-aws @pajuric Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!
Neural sparse search is a novel and efficient method for semantic retrieval, [introduced in OpenSearch 2.11](https://opensearch.org/blog/improving-document-retrieval-with-sparse-semantic-encoders/). Sparse encoding models encode text into (token, weight) entries, allowing OpenSearch to build indexes and perform searches using Lucene's inverted index. Neural sparse search is efficient and generalizes well in out-of-domain (OOD) scenarios. We are excited to announce the release of our v2 series neural sparse models: | ||
|
||
- **v2-distill model**: This model **reduces model parameters by 50%**, resulting in lower memory requirements and costs. It **increases ingestion throughput by 1.39 on GPU and 1.74x on CPU**. The v2-distill architecture supports both doc-only and bi-encoder modes. | ||
- **v2-mini model**: This model **reduces model parameters by 75%**, also reducing memory requirements and costs. It **increases ingestion throughput by 1.74x on GPU and 4.18x on CPU**. The v2-mini architecture supports the doc-only mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and on the line above, should it be "GPUs" and "CPUs" (plural)?
|
||
1. Register and deploy a tokenizer for search: | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json?
|
||
In these experiments, we ingested 1 million documents into an index and used 20 clients to perform concurrent searches. We recorded the p99 for both client-side search and model inference. We tested search performance for the **bi-encoder** mode. | ||
|
||
#### Remote deployment using GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Remote deployment using GPU | |
#### Remote deployment on a GPU |
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@pajuric @zhichao-aws Editorial comments are implemented and the blog is ready to publish. Thanks! |
- technical-posts | ||
has_science_table: true | ||
meta_keywords: OpenSearch semantic search, neural sparse search, semantic sparse retrieval | ||
meta_description: Accelerating inference and improving search with v2 neural sparse encoding models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the blog with the following meta:
meta_keywords: neural sparse models, OpenSearch semantic search, semantic sparse retrieval, neural search
meta_description: OpenSearch announces the availability of v2 series neural sparse models that enhance the efficiency of semantic sparse retrieval while accelerating inference and improving search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- yych | ||
- dylantong | ||
- kolchfa | ||
date: 2024-08-19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the date to 2024-08-21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Fanit Kolchina <[email protected]>
@nateynateynate @krisfreedain - Blog is ready to publish today! Let's ship it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please set featured_blog_post: false
setting "featured_blog_post: false" while we continue to promote OpenSearchCon Signed-off-by: Kris Freedain <[email protected]>
Description
Create blog: Advancing Search Quality and Inference Speed with v2 Series Neural Sparse Models
Issues Resolved
#3164
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.