[RFC] OpenSearch Neural Sentence Highlighting #1175

junqiu-lei · 2025-02-07T03:02:45Z

OpenSearch Neural Sentence Highlighting

Introduction

This document outlines the design and implementation of neural highlighting feature in OpenSearch. Neural highlighting aims to enhance search result highlighting by leveraging machine learning models to identify semantically relevant text fragments, going beyond traditional lexical matching approaches.

Problem Statement

Traditional highlighting in OpenSearch relies on lexical matching, which has several limitations. The current system cannot effectively capture semantically relevant content when there are no exact keyword matches. It struggles with identifying multiple relevant spans across a document and lacks the ability to provide context-aware highlighting based on query intent.

The need for neural highlighting stems from several key requirements in modern search systems. Users expect search results that understand semantic meaning, not just keyword matches. This requires highlighting that can identify contextually relevant passages even when exact terms don't match. Additionally, long documents often contain multiple relevant sections that need to be highlighted to provide comprehensive search results. The system should also be able to understand and highlight content based on the user's search intent rather than just matching words.

Github issue for this feature request: #145

Requirements

Easy User Integration

Simple configuration through existing highlight API
No additional setup beyond model deployment
Backward compatible with existing highlight syntax

Highlighting Quality and Performance

Semantically relevant highlights even without exact keyword matches
Context-aware highlighting that captures complete thoughts/sentences
Response times reasonable along with the model inference time

Neural Highlighting Integration

Implement new "neural" highlighter type
Support configuration through search request options
Enable model specification in highlight requests
Support fields type
- Neural Query
  - Most critical for semantic search
  - Direct alignment with neural highlighting
- Match Query
  - Most common text query type
  - Basic natural language support

Models

Provide pre-trained model option while feature release
Support remote model integration and provide blueprint docs
Provide default model option implementation (post p0)

Out of Scope

Model training or fine-tuning
Image or multi-modal highlighting
Automatic model selection or optimization
Neural highlights feature stats API (will look into it after neural-search plugin publish stats api framework)
Support GPU enabled sentence QA model

Current State

The current highlighting system in OpenSearch:

Supports multiple highlighter types (unified, plain, fvh), reference to OpenSearch highlighters document
Uses lexical matching for highlight selection
Lacks semantic understanding capabilities
Has no integration with ML models

Feature Touched Components

OpenSearch core
- no code changes needed
- extend highlighter framework from core into neural search plugin
neural-search plugin
- code changes needed
- main feature component for neural highlighter
ml-commons plugin
- code changes needed
- provide question-answer model inference service
Models train and fine tune

Model Support Considerations

We are evaluating three approaches for implementing neural highlighting, technically there are two types of QuestionAnswer models, and TextEmbedding model which is already supported to use inside neural-search plugins.

QuestionAnswer Model (preferred)

Sentence-level QA Model (e.g., MashQA, MultiSpanQA) (p0 release)

The system predicts entire sentences as answers. This approach is better suited for complete sentence highlighting and more appropriate for summary-like highlights.

The system aggregates token embeddings at sentence level and returns binary predictions per sentence. This approach is better for maintaining complete sentence context.

Example:

Query: "What helps prevent heart disease?"
Text: "Regular physical activity strengthens your cardiovascular system. A balanced diet rich in omega-3 fatty acids supports heart health. Stress management techniques can also reduce cardiac risks. Getting enough sleep is also important for overall health. Smoking and excessive alcohol consumption should be avoided."
Processing:
Sentence 1: "Regular physical activity strengthens your cardiovascular system."
Sentence 2: "A balanced diet rich in omega-3 fatty acids supports heart health."
Sentence 3: "Stress management techniques can also reduce cardiac risks."
Sentence 4: "Getting enough sleep is also important for overall health."
Sentence 5: "Smoking and excessive alcohol consumption should be avoided."
Sentence-level predictions (1 means answer sentence, 0 means not):
[1, 1, 1, 0, 1]
 ^  ^  ^  ^  ^
 First three sentences directly address heart disease prevention
 Fourth sentence about sleep is too general
 Last sentence about smoking/alcohol is relevant for prevention
Result highlight: 
- "Regular physical activity strengthens your cardiovascular system."
- "A balanced diet rich in omega-3 fatty acids supports heart health."
- "Stress management techniques can also reduce cardiac risks."
- "Smoking and excessive alcohol consumption should be avoided."

Token-level QA Model (POST p0)

The system predicts answer spans at token level, providing granular phrase-level highlighting that is better for precise span identification. and this approach might have higher resource costs for training. But it could be optional if user provide their own token level QA models to use.

The system uses the model for token classification and returns predictions per token, supporting multiple non-contiguous spans. The token-level QA model uses a BIO tagging scheme for predictions, where 'B' marks the beginning of an answer span (1), 'I' indicates the inside/continuation of an answer span (2), and 'O' represents tokens outside of any answer span (0).

Example:

Query: "What are symptoms of dehydration?"
Text: "When you don't drink enough water, your mouth feels dry and your energy levels drop significantly. Your skin loses its elasticity."
Tokens:        [CLS] When you don't drink enough water your mouth feels dry and your energy levels drop significantly your skin loses its elasticity [SEP]
BIO labels:     0    0    0    0    0    0     0     0    B    I   I   I   B    I      I     I    I           B    I    I    I   I    I
Result: The model identifies three symptom spans:
- "mouth feels dry"
- "energy levels drop significantly"
- "skin loses its elasticity"
Note how it correctly groups related tokens into meaningful phrases without relying on exact keyword matches.

Text Embedding Approach (e.g., existing text embedding model we supported)

This approach uses semantic similarity between query and text fragments.

Advantages of this method include that it requires no changes to the ml-commons plugin and offers flexibility in using existing pre-trained text embedding models from OpenSearch ml-commons.

However, a significant disadvantage is its inability to provide sentence context into embedding, which results in low precision when highlighting multiple sentences.

The system computes semantic similarity scores using a language-agnostic approach that remains flexible across different domains. For example, the output consists of similarity scores calculated between the query and text fragments.

Example:

Query: "What causes fatigue?"
Query embedding: [0.1, 0.4, -0.3, ...] (768-dim vector)

Text: "Poor sleep patterns disrupt your body's natural rhythm. Nutrient deficiencies can leave you feeling drained throughout the day. Excessive screen time strains your eyes and mental energy."

1. Split into fragments:
   Fragment 1: "Poor sleep patterns disrupt your body's natural rhythm."
   Fragment 2: "Nutrient deficiencies can leave you feeling drained throughout the day."
   Fragment 3: "Excessive screen time strains your eyes and mental energy."

2. Compute fragment embeddings:
   Fragment 1: [0.15, 0.45, -0.25, ...] (768-dim vector)
   Fragment 2: [0.12, 0.38, -0.28, ...] (768-dim vector)
   Fragment 3: [0.05, 0.30, -0.15, ...] (768-dim vector)

3. Calculate cosine similarities:
   Fragment 1 score: 0.89 (high similarity - sleep affects fatigue)
   Fragment 2 score: 0.85 (high similarity - nutrition affects energy)
   Fragment 3 score: 0.76 (medium-high similarity - mental strain causes fatigue)

Result: The model highlights all three fragments as they semantically relate to causes of fatigue, even though none contain the exact word "fatigue":
- "Poor sleep patterns disrupt your body's natural rhythm."
- "Nutrient deficiencies can leave you feeling drained throughout the day."
- "Excessive screen time strains your eyes and mental energy."

QA Model Benchmarks with small dataset

Note: currently the token-level QA model is not available, so the benchmark result is based on sentence-level QA model.

Query Latency Performance (Milliseconds)

Test Environments:

Single node dev cluster
Apple M2 pro
main branch OpenSearch 3.0.0
jvmArgs("-Xms1g", "-Xmx4g")
Local ML model on node

Highlighting Method	Text Match Only	MultiSpanQA	MashQA	Unified	Plain	FVH
Dataset: val_webmd_squad_v2_consec.json (2686 queries)
Mean	1.30	8.80	11.17	1.97	4.84	3.52
P50 (Median)	1.21	8.08	10.02	1.80	4.16	2.82
P90	1.73	11.79	14.94	2.85	7.05	6.73
P99	3.34	17.38	24.72	4.95	17.68	10.25
Dataset: MultiSpanQA_data_valid.json (653 queries)
Mean	1.05	9.60	11.19	1.79	2.24	2.20
P50 (Median)	0.96	8.90	10.72	1.55	2.07	2.05
P90	1.33	12.02	12.86	2.36	2.84	2.91
P99	2.34	20.15	20.76	6.30	4.83	5.33

Accuracy Benchmarks

Test Environments:
- g4dn.4xlarge EC2 instance with GPU enabled
- Python script run against model
- Accuracy benchmark result part could just for reference, it is unfair comparison since the dataset is different for the models. Will have unified dataset run against once available.
Evaluation Metrics Formulas
- Precision = True Positives / (True Positives + False Positives)
- Recall = True Positives / (True Positives + False Negatives)
- F1 = 2 * (Precision * Recall) / (Precision + Recall)

	MultiSpanQA Model	MashQA Model
Data set	rag-exploration/MultiSpanQA/data/MultiSpanQA_data/valid.json	rag-exploration/MultiSpanQA/data/mashqa_dataset/val_webmd_squad_v2_consec.json
Precision	76.20%	35%
Recall	72%	74.50%
F1 Score	74%	47.60%

Benchmark Key Findings

Latency Performance:
- MultiSpanQA averages 8.8-9.6ms, while MashQA averages 11.1-11.2ms
- Both neural models maintain sub-25ms P99 latencies, which is reasonable for with model inference api calls
- Consistent performance across different dataset sizes (2686 vs 653 queries)
- While traditional highlighting methods show faster response times, but neural approaches offer enhanced semantic understanding
Accuracy Trade-offs:
- MultiSpanQA shows significantly better precision (+41.20%)
- MashQA has slightly better recall (+2.50%)
- MultiSpanQA has substantially better overall F1 score (+26.40%)

Highlight Framework Considerations

When designing the neural highlighting feature, we evaluated two potential integration approaches:

OpenSearch Existing Highlighter Framework Approach (Preferred)

OpenSearch currently provides a robust Highlighter interface (as detailed in the highlighter documentation) that supports field-level access and control. The neural highlighting feature can naturally extend OpenSearch's existing highlighting capabilities through this framework.

This approach would maintain consistency with OpenSearch's architecture by following the same highlighting API pattern and integrating as a new highlight type alongside unified, plain, and fvh highlighters.

Example usage follows the familiar OpenSearch highlighting pattern:

{
  "highlight": {
    "fields": {
      "content": {
        "type": "neural"  // Introduce new highlight type of "neural"
      }
    },
    "options": { // model id can be passed in through options field
        "model_id": "adcdefg"
    }
  }
}

Key Benefits:
The approach maintains full compatibility with OpenSearch's highlighting architecture while reusing existing infrastructure. It provides natural integration with highlighting options and superior fragment handling capabilities. This design choice also ensures easier maintenance and future extensibility of the feature.

Neural Query Builder Wrapper (based on Highlighter Framework Approach)

Traditional OpenSearch queries maintain their text representation through standard methods. However, neural search queries present a unique challenge - they transform natural language queries into vector representations for KNN search, which loses the original query text in the neural search plugin highlighter. This loss is particularly problematic for features like highlighting, which need to understand the user's original search intent.

For example, when a user searches:

{
  "neural": {
    "text_vector": {
      "query_text": "intelligent document search",
      "k": 3
    }
  }
}

The query is converted to vector format for KNN search, but the original text "intelligent document search" must be preserved for highlighting relevant passages. Without preserving this text, the highlighter cannot effectively identify which parts of the document are semantically relevant to the user's query.

This challenge is unique to neural search queries because:

They operate primarily on vector representations
The semantic meaning needs to be preserved for highlighting
Traditional lexical highlighting methods don't apply directly to vector searches

we can use a wrapper pattern through NeuralKNNQueryBuilder to specifically address the challenges of neural search queries and it also brings benefits like avoid crashing neural search plugin whenever k-NN query structure changed.

To address the challenge of preserving original query text in neural search queries, we implement a Query Wrapper Design Pattern. This pattern encapsulates the neural KNN query while maintaining access to the original query text.

Implementation

public class NeuralKNNQuery extends Query {
    private final String originalQueryText;
    private final KNNQuery knnQuery;
    
    public NeuralKNNQuery(String queryText, KNNQuery knnQuery) {
        this.originalQueryText = queryText;
        this.knnQuery = knnQuery;
    }
    
    public String getOriginalQueryText() {
        return originalQueryText;
    }
    
    // Delegate KNN query methods
    @Override
    public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) {
        return knnQuery.createWeight(searcher, scoreMode, boost);
    }
    
    @Override
    public void visit(QueryVisitor visitor) {
        knnQuery.visit(visitor);
    }
    
    @Override
    public String toString(String field) {
        return "NeuralKNN(" + originalQueryText + ")";
    }
}

Benefits

Query Text Preservation: Maintains original query text throughout the search process
Clean Separation: Separates query text handling from vector search logic
Extensibility: Easy to add additional metadata or functionality
Compatibility: Works seamlessly with existing KNN query infrastructure
Maintainability: Changes to KNN query structure won't break neural search plugin

Usage Example

// In the neural search query builder
public Query build() {
    KNNQuery knnQuery = new KNNQuery(
        field,
        vectorQuery,
        k,
        numCandidates
    );
    
    return new NeuralKNNQuery(originalQueryText, knnQuery);
}

// In the highlighter
public HighlightField highlight(FieldHighlightContext context) {
    String queryText;
    if (context.query instanceof NeuralKNNQuery) {
        queryText = ((NeuralKNNQuery) context.query).getOriginalQueryText();
    } else {
        queryText = context.query.toString();
    }
    // Use queryText for highlighting...
}

This pattern ensures that the original query text is available for highlighting while maintaining the efficiency of vector-based search operations.

Search Processor Approach (Alternative)

While a search processor approach could modify search requests and responses as part of the search pipeline, it comes with significant limitations.

Key Limitations:
This approach would restrict access to field-level content and lack direct integration with the highlighting framework. It would necessitate duplicating existing highlighting logic and complicate fragment generation handling. Additionally, maintaining consistency with existing highlighters would become more challenging.

Solution HLD: Architectural and Component Design

graph LR
    A[Search Request with neural highlight] --> B[Neural Highlighter]
    
    subgraph "Neural Search Plugin"
        B --> |1.validate config| C[Configuration Validator]
        C --> |2.prepare text| D[Text Segmenter]
        D --> |3.inference request| E[ML Commons Accessor]
    end
    
    subgraph "ML Commons Plugin"
        E --> |4.model inference| F[Highlight QA Translator]
        F --> |5.process text| G[QA Model]
        G --> |6.highlight spans| F
    end
    
    F --> |results| E
    E --> D
    D --> |7.format response| B
    B --> H[Final Response]

Key Components

Neural Highlighter Component [neural search plugin]

This core component serves as the entry point for neural highlighting requests and orchestrates the entire highlighting process. During the query phase, it handles extracting original query text, validating configurations, and managing fragment settings including size and tag configurations.

The core highlighting logic focuses on text processing by segmenting input into processable chunks while maintaining document structure awareness. It implements sophisticated scoring and ranking algorithms for fragments and supports highlighting across multiple fields.

For model management, the component intelligently selects appropriate QA models based on configuration, handles versioning compatibility, and implements fallback strategies when needed. The fragment processing system carefully merges overlapping highlight spans while preserving natural sentence boundaries.

ML Commons InferenceQA Accessor Component [neural search plugin]

Acting as a bridge between the Neural Highlighter and ML Commons plugin, this component manages all model inference interactions. It maintains a connection pool to ML Commons, implements circuit breaker patterns, and handles batched inference requests with built-in caching.

The component implements fully asynchronous processing with non-blocking inference calls and careful thread pool management. Robust error handling includes retry logic with exponential backoff, graceful degradation options, and comprehensive error logging. Resource management ensures system stability through request throttling, memory monitoring, and proper cleanup procedures.

Highlighting QuestionAnswer Translator [ml-commons plugin]

This specialized translator handles the complexities of QA model interactions through multiple processing layers. At the token level, it implements BIO tagging schemes and manages token alignment with special case handling. The sentence-level processing includes boundary detection, scoring, and context window management.

The translator ensures clean output formatting by converting model outputs to highlight spans, validating formats, and providing proper output sanitization. This maintains consistency and reliability in the highlighting results.

Interaction Flow

Search request arrives with neural highlighting configuration
Neural Highlighter Component receives the request and validates configuration
Text is segmented and prepared for model inference
ML Commons InferenceQA Accessor sends inference requests to ML Commons
Highlighting QA Translator processes the text through the QA model
Results are translated back into highlight spans
Neural Highlighter Component formats and returns the final response

LLD Component Details

Neural Highlighter

The core component implements OpenSearch's Highlighter interface to provide neural-aware text highlighting. The NeuralHighlighter seamlessly integrates with OpenSearch's existing highlighting infrastructure while adding specialized support for machine learning-based highlighting. It preserves the original query text from query and uses this information to generate contextually relevant highlights through question-answering models.

The ML integration layer handles the interaction with OpenSearch's ML Commons framework. It manages asynchronous calls to question-answering models with proper timeout handling, transforms model responses into highlight spans, and ensures robust error handling. The highlighting results are then formatted and returned in OpenSearch's standard highlighting format, maintaining compatibility with existing search result rendering.

public class NeuralHighlighter implements Highlighter {
    private final MLCommonsClient mlCommonsClient;
    private final TextProcessor textProcessor;
        
    public static void initialize(MLCommonsClientAccessor mlClient) {
        NeuralHighlighter.mlCommonsClient = mlClient;
    }

    @Override
    public boolean canHighlight(MappedFieldType fieldType) {
        // check fieldType is the type we support
        return fieldType.isSearchable();
    }

    @Override
    public HighlightField highlight(FieldHighlightContext context) {
        // 1. Extract configuration and query
        HighlightConfig config = extractConfig(context);
        String originalQuery = extractOriginalQuery(context.query);
        
        // 2. Process text for highlighting
        String fieldText = context.hitContext.getFieldValue(context.fieldName);
        List<String> segments = textProcessor.segmentText(fieldText);
        
        // 3. Get highlights from ML model
        List<String> highlights = getHighlightsFromModel(
            config.modelId,
            originalQuery,
            segments,
            config.modelType,
            config.maxSnippets
        );
        
        // 4. Format and return highlights
        return formatHighlights(context.fieldName, highlights);
    }

    private String extractOriginalQuery(Query query) {
        if (query instanceof NeuralKNNQuery) {
            return ((NeuralKNNQuery) query).getOriginalQueryText();
        }
        return query.toString();
    }

    private List<String> getHighlightsFromModel(
        String modelId, 
        String query, 
        List<String> segments,
        String modelType,
        int maxSnippets
    ) {
        // Call ML model and process results
        Map<String, Object> response = mlCommonsClient.inferenceQA(
            modelId,
            query,
            String.join(" ", segments)
        );
        
        return getHighlightsBasedOnModelType(response, modelType);
    }

    private HighlightField formatHighlights(String fieldName, List<String> highlights) {
        Text[] fragments = highlights.stream()
            .map(h -> new Text("<em>" + h + "</em>"))
            .toArray(Text[]::new);
        return new HighlightField(fieldName, fragments);
    }
}

ML Commons Client

Interfaces with ML Commons for model inference. Responsible for:

Preparing input in the format expected by QA model
Managing model inference requests
Handling model responses
Converting between internal and ML Commons data formats

class MLCommonsClient {
    private final ModelService modelService;
    
    public Map<String, Object> inferenceQA(String modelId, String query, String text) {
        // Implementation details...
    }
}

QuestionAnsweringTranslator

Handles input/output processing for QA models in ML Commons. Responsible for:

Converting text and query to model input format
Processing model outputs into BIO labels
Managing tokenization and detokenization
Handling model-specific preprocessing

public class MultiSpanQuestionAnsweringTranslator extends SentenceTransformerTranslator {
    private List<String> tokens;

    @Override
    public NDList processInput(TranslatorContext ctx, Input input) {
    }

    @Override
    public Output processOutput(TranslatorContext ctx, NDList list) {
    }
}

Backward Compatibility

New highlighting type doesn't affect existing highlighters
Optional parameters maintain backward compatibility
Default fallback to traditional highlighting on errors
No breaking changes to existing APIs
Existing highlighting configurations remain unchanged

Security

Model access control through ML Commons
Resource limits for highlighting requests
Input validation for all parameters

Testability

Integration Testing Areas

Test neural highlighting with neural search queries
Test neural highlighting with match queries
Verify highlighting across multiple fields
Test highlighting with field boost settings
Test highlighting with filtered queries
Test highlighting with nested documents
Verify behavior with invalid configurations

Performance Testing

Will run the performance test against local model and remote model.

The text was updated successfully, but these errors were encountered:

q-andy · 2025-02-20T18:03:28Z

Are there any other OpenSearch plugins that have implemented their own highlighters?

junqiu-lei · 2025-02-20T18:43:23Z

Are there any other OpenSearch plugins that have implemented their own highlighters?

As I know, no from plugins.

junqiu-lei added Features Introduces a new unit of functionality that satisfies a requirement neural-search RFC v3.0.0 v3.0.0 labels Feb 7, 2025

junqiu-lei self-assigned this Feb 7, 2025

opensearch-infra bot added this to OpenSearch Roadmap Feb 7, 2025

github-project-automation bot moved this to New in OpenSearch Roadmap Feb 7, 2025

github-actions bot added the untriaged label Feb 7, 2025

junqiu-lei mentioned this issue Feb 11, 2025

Encapsulate KNNQueryBuilder creation within NeuralKNNQueryBuilder #1183

Merged

2 tasks

q-andy mentioned this issue Feb 14, 2025

Neural Highlighting Project Tasks Tracker #1182

Open

6 tasks

junqiu-lei removed the untriaged label Feb 19, 2025

q-andy mentioned this issue Feb 25, 2025

[RFC] Neural Plugin Stats API #1196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] OpenSearch Neural Sentence Highlighting #1175

[RFC] OpenSearch Neural Sentence Highlighting #1175

junqiu-lei commented Feb 7, 2025

q-andy commented Feb 20, 2025

junqiu-lei commented Feb 20, 2025

[RFC] OpenSearch Neural Sentence Highlighting #1175

[RFC] OpenSearch Neural Sentence Highlighting #1175

Comments

junqiu-lei commented Feb 7, 2025

OpenSearch Neural Sentence Highlighting

Introduction

Problem Statement

Requirements

Easy User Integration

Highlighting Quality and Performance

Neural Highlighting Integration

Models

Out of Scope

Current State

Feature Touched Components

Model Support Considerations

QuestionAnswer Model (preferred)

Sentence-level QA Model (e.g., MashQA, MultiSpanQA) (p0 release)

Token-level QA Model (POST p0)

Text Embedding Approach (e.g., existing text embedding model we supported)

QA Model Benchmarks with small dataset

Query Latency Performance (Milliseconds)

Test Environments:

Accuracy Benchmarks

Benchmark Key Findings

Highlight Framework Considerations

OpenSearch Existing Highlighter Framework Approach (Preferred)

Neural Query Builder Wrapper (based on Highlighter Framework Approach)

Implementation

Benefits

Usage Example

Search Processor Approach (Alternative)

Solution HLD: Architectural and Component Design

Key Components

Neural Highlighter Component [neural search plugin]

ML Commons InferenceQA Accessor Component [neural search plugin]

Highlighting QuestionAnswer Translator [ml-commons plugin]

Interaction Flow

LLD Component Details

Neural Highlighter

ML Commons Client

QuestionAnsweringTranslator

Backward Compatibility

Security

Testability

Integration Testing Areas

Performance Testing

q-andy commented Feb 20, 2025

junqiu-lei commented Feb 20, 2025