[RFC] OpenSearch Neural Sentence Highlighting #1175
Labels
Features
Introduces a new unit of functionality that satisfies a requirement
neural-search
RFC
v3.0.0
v3.0.0
OpenSearch Neural Sentence Highlighting
Introduction
This document outlines the design and implementation of neural highlighting feature in OpenSearch. Neural highlighting aims to enhance search result highlighting by leveraging machine learning models to identify semantically relevant text fragments, going beyond traditional lexical matching approaches.
Problem Statement
Traditional highlighting in OpenSearch relies on lexical matching, which has several limitations. The current system cannot effectively capture semantically relevant content when there are no exact keyword matches. It struggles with identifying multiple relevant spans across a document and lacks the ability to provide context-aware highlighting based on query intent.
The need for neural highlighting stems from several key requirements in modern search systems. Users expect search results that understand semantic meaning, not just keyword matches. This requires highlighting that can identify contextually relevant passages even when exact terms don't match. Additionally, long documents often contain multiple relevant sections that need to be highlighted to provide comprehensive search results. The system should also be able to understand and highlight content based on the user's search intent rather than just matching words.
Github issue for this feature request: #145
Requirements
Easy User Integration
Highlighting Quality and Performance
Neural Highlighting Integration
Models
Out of Scope
Current State
The current highlighting system in OpenSearch:
Feature Touched Components
Model Support Considerations
We are evaluating three approaches for implementing neural highlighting, technically there are two types of QuestionAnswer models, and TextEmbedding model which is already supported to use inside neural-search plugins.
QuestionAnswer Model (preferred)
Sentence-level QA Model (e.g., MashQA, MultiSpanQA) (p0 release)
The system predicts entire sentences as answers. This approach is better suited for complete sentence highlighting and more appropriate for summary-like highlights.
The system aggregates token embeddings at sentence level and returns binary predictions per sentence. This approach is better for maintaining complete sentence context.
Example:
Token-level QA Model (POST p0)
The system predicts answer spans at token level, providing granular phrase-level highlighting that is better for precise span identification. and this approach might have higher resource costs for training. But it could be optional if user provide their own token level QA models to use.
The system uses the model for token classification and returns predictions per token, supporting multiple non-contiguous spans. The token-level QA model uses a BIO tagging scheme for predictions, where 'B' marks the beginning of an answer span (1), 'I' indicates the inside/continuation of an answer span (2), and 'O' represents tokens outside of any answer span (0).
Example:
Text Embedding Approach (e.g., existing text embedding model we supported)
This approach uses semantic similarity between query and text fragments.
Advantages of this method include that it requires no changes to the ml-commons plugin and offers flexibility in using existing pre-trained text embedding models from OpenSearch ml-commons.
However, a significant disadvantage is its inability to provide sentence context into embedding, which results in low precision when highlighting multiple sentences.
The system computes semantic similarity scores using a language-agnostic approach that remains flexible across different domains. For example, the output consists of similarity scores calculated between the query and text fragments.
QA Model Benchmarks with small dataset
Note: currently the token-level QA model is not available, so the benchmark result is based on sentence-level QA model.
Query Latency Performance (Milliseconds)
Test Environments:
Accuracy Benchmarks
Benchmark Key Findings
Highlight Framework Considerations
When designing the neural highlighting feature, we evaluated two potential integration approaches:
OpenSearch Existing Highlighter Framework Approach (Preferred)
OpenSearch currently provides a robust Highlighter interface (as detailed in the highlighter documentation) that supports field-level access and control. The neural highlighting feature can naturally extend OpenSearch's existing highlighting capabilities through this framework.
This approach would maintain consistency with OpenSearch's architecture by following the same highlighting API pattern and integrating as a new highlight type alongside unified, plain, and fvh highlighters.
Example usage follows the familiar OpenSearch highlighting pattern:
Key Benefits:
The approach maintains full compatibility with OpenSearch's highlighting architecture while reusing existing infrastructure. It provides natural integration with highlighting options and superior fragment handling capabilities. This design choice also ensures easier maintenance and future extensibility of the feature.
Neural Query Builder Wrapper (based on Highlighter Framework Approach)
Traditional OpenSearch queries maintain their text representation through standard methods. However, neural search queries present a unique challenge - they transform natural language queries into vector representations for KNN search, which loses the original query text in the neural search plugin highlighter. This loss is particularly problematic for features like highlighting, which need to understand the user's original search intent.
For example, when a user searches:
The query is converted to vector format for KNN search, but the original text "intelligent document search" must be preserved for highlighting relevant passages. Without preserving this text, the highlighter cannot effectively identify which parts of the document are semantically relevant to the user's query.
This challenge is unique to neural search queries because:
we can use a wrapper pattern through
NeuralKNNQueryBuilder
to specifically address the challenges of neural search queries and it also brings benefits like avoid crashing neural search plugin whenever k-NN query structure changed.To address the challenge of preserving original query text in neural search queries, we implement a Query Wrapper Design Pattern. This pattern encapsulates the neural KNN query while maintaining access to the original query text.
Implementation
Benefits
Usage Example
This pattern ensures that the original query text is available for highlighting while maintaining the efficiency of vector-based search operations.
Search Processor Approach (Alternative)
While a search processor approach could modify search requests and responses as part of the search pipeline, it comes with significant limitations.
Key Limitations:
This approach would restrict access to field-level content and lack direct integration with the highlighting framework. It would necessitate duplicating existing highlighting logic and complicate fragment generation handling. Additionally, maintaining consistency with existing highlighters would become more challenging.
Solution HLD: Architectural and Component Design
Key Components
Neural Highlighter Component [neural search plugin]
This core component serves as the entry point for neural highlighting requests and orchestrates the entire highlighting process. During the query phase, it handles extracting original query text, validating configurations, and managing fragment settings including size and tag configurations.
The core highlighting logic focuses on text processing by segmenting input into processable chunks while maintaining document structure awareness. It implements sophisticated scoring and ranking algorithms for fragments and supports highlighting across multiple fields.
For model management, the component intelligently selects appropriate QA models based on configuration, handles versioning compatibility, and implements fallback strategies when needed. The fragment processing system carefully merges overlapping highlight spans while preserving natural sentence boundaries.
ML Commons InferenceQA Accessor Component [neural search plugin]
Acting as a bridge between the Neural Highlighter and ML Commons plugin, this component manages all model inference interactions. It maintains a connection pool to ML Commons, implements circuit breaker patterns, and handles batched inference requests with built-in caching.
The component implements fully asynchronous processing with non-blocking inference calls and careful thread pool management. Robust error handling includes retry logic with exponential backoff, graceful degradation options, and comprehensive error logging. Resource management ensures system stability through request throttling, memory monitoring, and proper cleanup procedures.
Highlighting QuestionAnswer Translator [ml-commons plugin]
This specialized translator handles the complexities of QA model interactions through multiple processing layers. At the token level, it implements BIO tagging schemes and manages token alignment with special case handling. The sentence-level processing includes boundary detection, scoring, and context window management.
The translator ensures clean output formatting by converting model outputs to highlight spans, validating formats, and providing proper output sanitization. This maintains consistency and reliability in the highlighting results.
Interaction Flow
LLD Component Details
Neural Highlighter
The core component implements OpenSearch's Highlighter interface to provide neural-aware text highlighting. The NeuralHighlighter seamlessly integrates with OpenSearch's existing highlighting infrastructure while adding specialized support for machine learning-based highlighting. It preserves the original query text from query and uses this information to generate contextually relevant highlights through question-answering models.
The ML integration layer handles the interaction with OpenSearch's ML Commons framework. It manages asynchronous calls to question-answering models with proper timeout handling, transforms model responses into highlight spans, and ensures robust error handling. The highlighting results are then formatted and returned in OpenSearch's standard highlighting format, maintaining compatibility with existing search result rendering.
ML Commons Client
Interfaces with ML Commons for model inference. Responsible for:
QuestionAnsweringTranslator
Handles input/output processing for QA models in ML Commons. Responsible for:
Backward Compatibility
Security
Testability
Integration Testing Areas
Performance Testing
Will run the performance test against local model and remote model.
The text was updated successfully, but these errors were encountered: