Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] OpenSearch Neural Sentence Highlighting #1175

Open
junqiu-lei opened this issue Feb 7, 2025 · 2 comments
Open

[RFC] OpenSearch Neural Sentence Highlighting #1175

junqiu-lei opened this issue Feb 7, 2025 · 2 comments
Assignees
Labels
Features Introduces a new unit of functionality that satisfies a requirement neural-search RFC v3.0.0 v3.0.0

Comments

@junqiu-lei
Copy link
Member

OpenSearch Neural Sentence Highlighting

Introduction

This document outlines the design and implementation of neural highlighting feature in OpenSearch. Neural highlighting aims to enhance search result highlighting by leveraging machine learning models to identify semantically relevant text fragments, going beyond traditional lexical matching approaches.

Problem Statement

Traditional highlighting in OpenSearch relies on lexical matching, which has several limitations. The current system cannot effectively capture semantically relevant content when there are no exact keyword matches. It struggles with identifying multiple relevant spans across a document and lacks the ability to provide context-aware highlighting based on query intent.

The need for neural highlighting stems from several key requirements in modern search systems. Users expect search results that understand semantic meaning, not just keyword matches. This requires highlighting that can identify contextually relevant passages even when exact terms don't match. Additionally, long documents often contain multiple relevant sections that need to be highlighted to provide comprehensive search results. The system should also be able to understand and highlight content based on the user's search intent rather than just matching words.

Github issue for this feature request: #145

Requirements

Easy User Integration

  • Simple configuration through existing highlight API
  • No additional setup beyond model deployment
  • Backward compatible with existing highlight syntax

Highlighting Quality and Performance

  • Semantically relevant highlights even without exact keyword matches
  • Context-aware highlighting that captures complete thoughts/sentences
  • Response times reasonable along with the model inference time

Neural Highlighting Integration

  • Implement new "neural" highlighter type
  • Support configuration through search request options
  • Enable model specification in highlight requests
  • Support fields type
    • Neural Query
      • Most critical for semantic search
      • Direct alignment with neural highlighting
    • Match Query
      • Most common text query type
      • Basic natural language support

Models

  • Provide pre-trained model option while feature release
  • Support remote model integration and provide blueprint docs
  • Provide default model option implementation (post p0)

Out of Scope

  • Model training or fine-tuning
  • Image or multi-modal highlighting
  • Automatic model selection or optimization
  • Neural highlights feature stats API (will look into it after neural-search plugin publish stats api framework)
  • Support GPU enabled sentence QA model

Current State

The current highlighting system in OpenSearch:

  1. Supports multiple highlighter types (unified, plain, fvh), reference to OpenSearch highlighters document
  2. Uses lexical matching for highlight selection
  3. Lacks semantic understanding capabilities
  4. Has no integration with ML models

Feature Touched Components

  • OpenSearch core
    • no code changes needed
    • extend highlighter framework from core into neural search plugin
  • neural-search plugin
    • code changes needed
    • main feature component for neural highlighter
  • ml-commons plugin
    • code changes needed
    • provide question-answer model inference service
  • Models train and fine tune

Model Support Considerations

We are evaluating three approaches for implementing neural highlighting, technically there are two types of QuestionAnswer models, and TextEmbedding model which is already supported to use inside neural-search plugins.

QuestionAnswer Model (preferred)

Sentence-level QA Model (e.g., MashQA, MultiSpanQA) (p0 release)

The system predicts entire sentences as answers. This approach is better suited for complete sentence highlighting and more appropriate for summary-like highlights.

The system aggregates token embeddings at sentence level and returns binary predictions per sentence. This approach is better for maintaining complete sentence context.

Example:

Query: "What helps prevent heart disease?"
Text: "Regular physical activity strengthens your cardiovascular system. A balanced diet rich in omega-3 fatty acids supports heart health. Stress management techniques can also reduce cardiac risks. Getting enough sleep is also important for overall health. Smoking and excessive alcohol consumption should be avoided."
Processing:
Sentence 1: "Regular physical activity strengthens your cardiovascular system."
Sentence 2: "A balanced diet rich in omega-3 fatty acids supports heart health."
Sentence 3: "Stress management techniques can also reduce cardiac risks."
Sentence 4: "Getting enough sleep is also important for overall health."
Sentence 5: "Smoking and excessive alcohol consumption should be avoided."
Sentence-level predictions (1 means answer sentence, 0 means not):
[1, 1, 1, 0, 1]
 ^  ^  ^  ^  ^
 First three sentences directly address heart disease prevention
 Fourth sentence about sleep is too general
 Last sentence about smoking/alcohol is relevant for prevention
Result highlight: 
- "Regular physical activity strengthens your cardiovascular system."
- "A balanced diet rich in omega-3 fatty acids supports heart health."
- "Stress management techniques can also reduce cardiac risks."
- "Smoking and excessive alcohol consumption should be avoided."

Token-level QA Model (POST p0)

The system predicts answer spans at token level, providing granular phrase-level highlighting that is better for precise span identification. and this approach might have higher resource costs for training. But it could be optional if user provide their own token level QA models to use.

The system uses the model for token classification and returns predictions per token, supporting multiple non-contiguous spans. The token-level QA model uses a BIO tagging scheme for predictions, where 'B' marks the beginning of an answer span (1), 'I' indicates the inside/continuation of an answer span (2), and 'O' represents tokens outside of any answer span (0).

Example:

Query: "What are symptoms of dehydration?"
Text: "When you don't drink enough water, your mouth feels dry and your energy levels drop significantly. Your skin loses its elasticity."
Tokens:        [CLS] When you don't drink enough water your mouth feels dry and your energy levels drop significantly your skin loses its elasticity [SEP]
BIO labels:     0    0    0    0    0    0     0     0    B    I   I   I   B    I      I     I    I           B    I    I    I   I    I
Result: The model identifies three symptom spans:
- "mouth feels dry"
- "energy levels drop significantly"
- "skin loses its elasticity"
Note how it correctly groups related tokens into meaningful phrases without relying on exact keyword matches.

Text Embedding Approach (e.g., existing text embedding model we supported)

This approach uses semantic similarity between query and text fragments.

Advantages of this method include that it requires no changes to the ml-commons plugin and offers flexibility in using existing pre-trained text embedding models from OpenSearch ml-commons.

However, a significant disadvantage is its inability to provide sentence context into embedding, which results in low precision when highlighting multiple sentences.

The system computes semantic similarity scores using a language-agnostic approach that remains flexible across different domains. For example, the output consists of similarity scores calculated between the query and text fragments.

  • Example:
Query: "What causes fatigue?"
Query embedding: [0.1, 0.4, -0.3, ...] (768-dim vector)

Text: "Poor sleep patterns disrupt your body's natural rhythm. Nutrient deficiencies can leave you feeling drained throughout the day. Excessive screen time strains your eyes and mental energy."

1. Split into fragments:
   Fragment 1: "Poor sleep patterns disrupt your body's natural rhythm."
   Fragment 2: "Nutrient deficiencies can leave you feeling drained throughout the day."
   Fragment 3: "Excessive screen time strains your eyes and mental energy."

2. Compute fragment embeddings:
   Fragment 1: [0.15, 0.45, -0.25, ...] (768-dim vector)
   Fragment 2: [0.12, 0.38, -0.28, ...] (768-dim vector)
   Fragment 3: [0.05, 0.30, -0.15, ...] (768-dim vector)

3. Calculate cosine similarities:
   Fragment 1 score: 0.89 (high similarity - sleep affects fatigue)
   Fragment 2 score: 0.85 (high similarity - nutrition affects energy)
   Fragment 3 score: 0.76 (medium-high similarity - mental strain causes fatigue)

Result: The model highlights all three fragments as they semantically relate to causes of fatigue, even though none contain the exact word "fatigue":
- "Poor sleep patterns disrupt your body's natural rhythm."
- "Nutrient deficiencies can leave you feeling drained throughout the day."
- "Excessive screen time strains your eyes and mental energy."

QA Model Benchmarks with small dataset

Note: currently the token-level QA model is not available, so the benchmark result is based on sentence-level QA model.

Query Latency Performance (Milliseconds)

Test Environments:

  • Single node dev cluster
  • Apple M2 pro
  • main branch OpenSearch 3.0.0
  • jvmArgs("-Xms1g", "-Xmx4g")
  • Local ML model on node
Highlighting Method Text Match Only MultiSpanQA MashQA Unified Plain FVH
Dataset: val_webmd_squad_v2_consec.json (2686 queries)
Mean 1.30 8.80 11.17 1.97 4.84 3.52
P50 (Median) 1.21 8.08 10.02 1.80 4.16 2.82
P90 1.73 11.79 14.94 2.85 7.05 6.73
P99 3.34 17.38 24.72 4.95 17.68 10.25
Dataset: MultiSpanQA_data_valid.json (653 queries)
Mean 1.05 9.60 11.19 1.79 2.24 2.20
P50 (Median) 0.96 8.90 10.72 1.55 2.07 2.05
P90 1.33 12.02 12.86 2.36 2.84 2.91
P99 2.34 20.15 20.76 6.30 4.83 5.33

Accuracy Benchmarks

  • Test Environments:
    • g4dn.4xlarge EC2 instance with GPU enabled
    • Python script run against model
    • Accuracy benchmark result part could just for reference, it is unfair comparison since the dataset is different for the models. Will have unified dataset run against once available.
  • Evaluation Metrics Formulas
    • Precision = True Positives / (True Positives + False Positives)
    • Recall = True Positives / (True Positives + False Negatives)
    • F1 = 2 * (Precision * Recall) / (Precision + Recall)
MultiSpanQA Model MashQA Model
Data set rag-exploration/MultiSpanQA/data/MultiSpanQA_data/valid.json rag-exploration/MultiSpanQA/data/mashqa_dataset/val_webmd_squad_v2_consec.json
Precision 76.20% 35%
Recall 72% 74.50%
F1 Score 74% 47.60%

Benchmark Key Findings

  • Latency Performance:
    • MultiSpanQA averages 8.8-9.6ms, while MashQA averages 11.1-11.2ms
    • Both neural models maintain sub-25ms P99 latencies, which is reasonable for with model inference api calls
    • Consistent performance across different dataset sizes (2686 vs 653 queries)
    • While traditional highlighting methods show faster response times, but neural approaches offer enhanced semantic understanding
  • Accuracy Trade-offs:
    • MultiSpanQA shows significantly better precision (+41.20%)
    • MashQA has slightly better recall (+2.50%)
    • MultiSpanQA has substantially better overall F1 score (+26.40%)

Highlight Framework Considerations

When designing the neural highlighting feature, we evaluated two potential integration approaches:

OpenSearch Existing Highlighter Framework Approach (Preferred)

OpenSearch currently provides a robust Highlighter interface (as detailed in the highlighter documentation) that supports field-level access and control. The neural highlighting feature can naturally extend OpenSearch's existing highlighting capabilities through this framework.

This approach would maintain consistency with OpenSearch's architecture by following the same highlighting API pattern and integrating as a new highlight type alongside unified, plain, and fvh highlighters.

Example usage follows the familiar OpenSearch highlighting pattern:

{
  "highlight": {
    "fields": {
      "content": {
        "type": "neural"  // Introduce new highlight type of "neural"
      }
    },
    "options": { // model id can be passed in through options field
        "model_id": "adcdefg"
    }
  }
}

Key Benefits:
The approach maintains full compatibility with OpenSearch's highlighting architecture while reusing existing infrastructure. It provides natural integration with highlighting options and superior fragment handling capabilities. This design choice also ensures easier maintenance and future extensibility of the feature.

Neural Query Builder Wrapper (based on Highlighter Framework Approach)

Traditional OpenSearch queries maintain their text representation through standard methods. However, neural search queries present a unique challenge - they transform natural language queries into vector representations for KNN search, which loses the original query text in the neural search plugin highlighter. This loss is particularly problematic for features like highlighting, which need to understand the user's original search intent.

For example, when a user searches:

{
  "neural": {
    "text_vector": {
      "query_text": "intelligent document search",
      "k": 3
    }
  }
}

The query is converted to vector format for KNN search, but the original text "intelligent document search" must be preserved for highlighting relevant passages. Without preserving this text, the highlighter cannot effectively identify which parts of the document are semantically relevant to the user's query.

This challenge is unique to neural search queries because:

  1. They operate primarily on vector representations
  2. The semantic meaning needs to be preserved for highlighting
  3. Traditional lexical highlighting methods don't apply directly to vector searches

we can use a wrapper pattern through NeuralKNNQueryBuilder to specifically address the challenges of neural search queries and it also brings benefits like avoid crashing neural search plugin whenever k-NN query structure changed.

To address the challenge of preserving original query text in neural search queries, we implement a Query Wrapper Design Pattern. This pattern encapsulates the neural KNN query while maintaining access to the original query text.

Implementation

public class NeuralKNNQuery extends Query {
    private final String originalQueryText;
    private final KNNQuery knnQuery;
    
    public NeuralKNNQuery(String queryText, KNNQuery knnQuery) {
        this.originalQueryText = queryText;
        this.knnQuery = knnQuery;
    }
    
    public String getOriginalQueryText() {
        return originalQueryText;
    }
    
    // Delegate KNN query methods
    @Override
    public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) {
        return knnQuery.createWeight(searcher, scoreMode, boost);
    }
    
    @Override
    public void visit(QueryVisitor visitor) {
        knnQuery.visit(visitor);
    }
    
    @Override
    public String toString(String field) {
        return "NeuralKNN(" + originalQueryText + ")";
    }
}

Benefits

  1. Query Text Preservation: Maintains original query text throughout the search process
  2. Clean Separation: Separates query text handling from vector search logic
  3. Extensibility: Easy to add additional metadata or functionality
  4. Compatibility: Works seamlessly with existing KNN query infrastructure
  5. Maintainability: Changes to KNN query structure won't break neural search plugin

Usage Example

// In the neural search query builder
public Query build() {
    KNNQuery knnQuery = new KNNQuery(
        field,
        vectorQuery,
        k,
        numCandidates
    );
    
    return new NeuralKNNQuery(originalQueryText, knnQuery);
}

// In the highlighter
public HighlightField highlight(FieldHighlightContext context) {
    String queryText;
    if (context.query instanceof NeuralKNNQuery) {
        queryText = ((NeuralKNNQuery) context.query).getOriginalQueryText();
    } else {
        queryText = context.query.toString();
    }
    // Use queryText for highlighting...
}

This pattern ensures that the original query text is available for highlighting while maintaining the efficiency of vector-based search operations.

Search Processor Approach (Alternative)

While a search processor approach could modify search requests and responses as part of the search pipeline, it comes with significant limitations.

Key Limitations:
This approach would restrict access to field-level content and lack direct integration with the highlighting framework. It would necessitate duplicating existing highlighting logic and complicate fragment generation handling. Additionally, maintaining consistency with existing highlighters would become more challenging.

Solution HLD: Architectural and Component Design

graph LR
    A[Search Request with neural highlight] --> B[Neural Highlighter]
    
    subgraph "Neural Search Plugin"
        B --> |1.validate config| C[Configuration Validator]
        C --> |2.prepare text| D[Text Segmenter]
        D --> |3.inference request| E[ML Commons Accessor]
    end
    
    subgraph "ML Commons Plugin"
        E --> |4.model inference| F[Highlight QA Translator]
        F --> |5.process text| G[QA Model]
        G --> |6.highlight spans| F
    end
    
    F --> |results| E
    E --> D
    D --> |7.format response| B
    B --> H[Final Response]
Loading

Key Components

Neural Highlighter Component [neural search plugin]

This core component serves as the entry point for neural highlighting requests and orchestrates the entire highlighting process. During the query phase, it handles extracting original query text, validating configurations, and managing fragment settings including size and tag configurations.

The core highlighting logic focuses on text processing by segmenting input into processable chunks while maintaining document structure awareness. It implements sophisticated scoring and ranking algorithms for fragments and supports highlighting across multiple fields.

For model management, the component intelligently selects appropriate QA models based on configuration, handles versioning compatibility, and implements fallback strategies when needed. The fragment processing system carefully merges overlapping highlight spans while preserving natural sentence boundaries.

ML Commons InferenceQA Accessor Component [neural search plugin]

Acting as a bridge between the Neural Highlighter and ML Commons plugin, this component manages all model inference interactions. It maintains a connection pool to ML Commons, implements circuit breaker patterns, and handles batched inference requests with built-in caching.

The component implements fully asynchronous processing with non-blocking inference calls and careful thread pool management. Robust error handling includes retry logic with exponential backoff, graceful degradation options, and comprehensive error logging. Resource management ensures system stability through request throttling, memory monitoring, and proper cleanup procedures.

Highlighting QuestionAnswer Translator [ml-commons plugin]

This specialized translator handles the complexities of QA model interactions through multiple processing layers. At the token level, it implements BIO tagging schemes and manages token alignment with special case handling. The sentence-level processing includes boundary detection, scoring, and context window management.

The translator ensures clean output formatting by converting model outputs to highlight spans, validating formats, and providing proper output sanitization. This maintains consistency and reliability in the highlighting results.

Interaction Flow

  1. Search request arrives with neural highlighting configuration
  2. Neural Highlighter Component receives the request and validates configuration
  3. Text is segmented and prepared for model inference
  4. ML Commons InferenceQA Accessor sends inference requests to ML Commons
  5. Highlighting QA Translator processes the text through the QA model
  6. Results are translated back into highlight spans
  7. Neural Highlighter Component formats and returns the final response

LLD Component Details

Neural Highlighter

The core component implements OpenSearch's Highlighter interface to provide neural-aware text highlighting. The NeuralHighlighter seamlessly integrates with OpenSearch's existing highlighting infrastructure while adding specialized support for machine learning-based highlighting. It preserves the original query text from query and uses this information to generate contextually relevant highlights through question-answering models.

The ML integration layer handles the interaction with OpenSearch's ML Commons framework. It manages asynchronous calls to question-answering models with proper timeout handling, transforms model responses into highlight spans, and ensures robust error handling. The highlighting results are then formatted and returned in OpenSearch's standard highlighting format, maintaining compatibility with existing search result rendering.

public class NeuralHighlighter implements Highlighter {
    private final MLCommonsClient mlCommonsClient;
    private final TextProcessor textProcessor;
        
    public static void initialize(MLCommonsClientAccessor mlClient) {
        NeuralHighlighter.mlCommonsClient = mlClient;
    }

    @Override
    public boolean canHighlight(MappedFieldType fieldType) {
        // check fieldType is the type we support
        return fieldType.isSearchable();
    }

    @Override
    public HighlightField highlight(FieldHighlightContext context) {
        // 1. Extract configuration and query
        HighlightConfig config = extractConfig(context);
        String originalQuery = extractOriginalQuery(context.query);
        
        // 2. Process text for highlighting
        String fieldText = context.hitContext.getFieldValue(context.fieldName);
        List<String> segments = textProcessor.segmentText(fieldText);
        
        // 3. Get highlights from ML model
        List<String> highlights = getHighlightsFromModel(
            config.modelId,
            originalQuery,
            segments,
            config.modelType,
            config.maxSnippets
        );
        
        // 4. Format and return highlights
        return formatHighlights(context.fieldName, highlights);
    }

    private String extractOriginalQuery(Query query) {
        if (query instanceof NeuralKNNQuery) {
            return ((NeuralKNNQuery) query).getOriginalQueryText();
        }
        return query.toString();
    }

    private List<String> getHighlightsFromModel(
        String modelId, 
        String query, 
        List<String> segments,
        String modelType,
        int maxSnippets
    ) {
        // Call ML model and process results
        Map<String, Object> response = mlCommonsClient.inferenceQA(
            modelId,
            query,
            String.join(" ", segments)
        );
        
        return getHighlightsBasedOnModelType(response, modelType);
    }

    private HighlightField formatHighlights(String fieldName, List<String> highlights) {
        Text[] fragments = highlights.stream()
            .map(h -> new Text("<em>" + h + "</em>"))
            .toArray(Text[]::new);
        return new HighlightField(fieldName, fragments);
    }
}

ML Commons Client

Interfaces with ML Commons for model inference. Responsible for:

  • Preparing input in the format expected by QA model
  • Managing model inference requests
  • Handling model responses
  • Converting between internal and ML Commons data formats
class MLCommonsClient {
    private final ModelService modelService;
    
    public Map<String, Object> inferenceQA(String modelId, String query, String text) {
        // Implementation details...
    }
}

QuestionAnsweringTranslator

Handles input/output processing for QA models in ML Commons. Responsible for:

  • Converting text and query to model input format
  • Processing model outputs into BIO labels
  • Managing tokenization and detokenization
  • Handling model-specific preprocessing
public class MultiSpanQuestionAnsweringTranslator extends SentenceTransformerTranslator {
    private List<String> tokens;

    @Override
    public NDList processInput(TranslatorContext ctx, Input input) {
    }

    @Override
    public Output processOutput(TranslatorContext ctx, NDList list) {
    }
}

Backward Compatibility

  1. New highlighting type doesn't affect existing highlighters
  2. Optional parameters maintain backward compatibility
  3. Default fallback to traditional highlighting on errors
  4. No breaking changes to existing APIs
  5. Existing highlighting configurations remain unchanged

Security

  1. Model access control through ML Commons
  2. Resource limits for highlighting requests
  3. Input validation for all parameters

Testability

Integration Testing Areas

  • Test neural highlighting with neural search queries
  • Test neural highlighting with match queries
  • Verify highlighting across multiple fields
  • Test highlighting with field boost settings
  • Test highlighting with filtered queries
  • Test highlighting with nested documents
  • Verify behavior with invalid configurations

Performance Testing

Will run the performance test against local model and remote model.

@q-andy
Copy link

q-andy commented Feb 20, 2025

Are there any other OpenSearch plugins that have implemented their own highlighters?

@junqiu-lei
Copy link
Member Author

Are there any other OpenSearch plugins that have implemented their own highlighters?

As I know, no from plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement neural-search RFC v3.0.0 v3.0.0
Projects
Status: New
Development

No branches or pull requests

2 participants