[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

chishui · 2025-02-12T06:52:24Z

Problem Statement

Users who want to adopt semantic search often need to modify their existing queries to use neural search capabilities. This creates friction in adoption and requires significant effort to update existing applications. We need a solution that allows users to leverage semantic search capabilities while maintaining their current query structure.

Proposed Solution

Create a new search request processor that automatically rewrites traditional match queries to semantic search queries (specifically neural sparse queries) based on configuration. This allows users to maintain their existing query structure while benefiting from semantic search capabilities.

Sample Configuration

{
  "request_processors": [
    {
        "semantic_search_rewrite_processor": {
            "tag": "semantic_rewrite",
            "description": "Processor to rewrite match queries to neural sparse queries",
            "model_id": "my-sparse-model",
            "field_map": {
                "text": "text_embedding"
            }
        }
    }
  ]
}

Technical Design

Query Rewrite Process

The processor traverses the query builder tree using depth-first search
When encountering a match query, it checks if the field is configured in the field_map
If matched, it replaces the match query with a neural sparse query
For compound queries, it creates new instances while preserving the query structure

Implementation Details

The processor will be implemented as part of the neural search plugin
It will only transform match queries that have fields configured in the field_map
Compound queries (bool, dis_max, etc.) will be preserved with only the relevant match queries transformed

Benefits

Zero-code migration path to semantic search
Preserves existing query structure and logic
Configurable field mapping for controlled adoption
No performance impact for non-matching queries

Limitations

Currently limited to match queries (not multi_match)
Requires field mapping configuration
Some match query attributes may not have equivalent features in neural sparse queries

Performance Considerations

The query transformation process is O(n) where n is the number of query nodes
No additional network calls or heavy computations during transformation
Minimal memory overhead as it only creates new objects for transformed queries

What alternatives have you considered?

Another option is to modify the existing ML Inference Search Request Processor (introduced in OpenSearch 2.16).

There are some limitations which makes ML inference processor not an existing viable solution.

With query_template, the whole query body will be replaced to the template. So even if match query is part of a compound query, it's not match query but the whole compound query is replaced. And if query_template is not used, only field configured in output_map is replaced to model output, the query is still match query, and no semantic search will run.
Although supporting wild card match in input_map through JsonPath's functionality, and multiple field values could be found, those values are only used for batch inferencing, and batch inferencing results are handled as a whole, there is no mechanism to restore them back to their original location in the query.

To address these limitation requires a dedicated design and non trivial work.

The text was updated successfully, but these errors were encountered:

chishui added enhancement untriaged labels Feb 12, 2025

heemin32 removed the untriaged label Feb 13, 2025

heemin32 added this to Neural Search RoadMap Feb 13, 2025

heemin32 moved this to Backlog in Neural Search RoadMap Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

chishui commented Feb 12, 2025

[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

Comments

chishui commented Feb 12, 2025

Problem Statement

Proposed Solution

Sample Configuration

Technical Design

Query Rewrite Process

Implementation Details

Benefits

Limitations

Performance Considerations

What alternatives have you considered?