Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

Open
chishui opened this issue Feb 12, 2025 · 0 comments
Open

[RFC] Search Request Processor for Semantic Search Query Rewrite #1184

chishui opened this issue Feb 12, 2025 · 0 comments

Comments

@chishui
Copy link
Contributor

chishui commented Feb 12, 2025

Problem Statement

Users who want to adopt semantic search often need to modify their existing queries to use neural search capabilities. This creates friction in adoption and requires significant effort to update existing applications. We need a solution that allows users to leverage semantic search capabilities while maintaining their current query structure.

Proposed Solution

Create a new search request processor that automatically rewrites traditional match queries to semantic search queries (specifically neural sparse queries) based on configuration. This allows users to maintain their existing query structure while benefiting from semantic search capabilities.

Sample Configuration

{
  "request_processors": [
    {
        "semantic_search_rewrite_processor": {
            "tag": "semantic_rewrite",
            "description": "Processor to rewrite match queries to neural sparse queries",
            "model_id": "my-sparse-model",
            "field_map": {
                "text": "text_embedding"
            }
        }
    }
  ]
}

Technical Design

Query Rewrite Process

  1. The processor traverses the query builder tree using depth-first search
  2. When encountering a match query, it checks if the field is configured in the field_map
  3. If matched, it replaces the match query with a neural sparse query
  4. For compound queries, it creates new instances while preserving the query structure

Implementation Details

  • The processor will be implemented as part of the neural search plugin
  • It will only transform match queries that have fields configured in the field_map
  • Compound queries (bool, dis_max, etc.) will be preserved with only the relevant match queries transformed

Benefits

  1. Zero-code migration path to semantic search
  2. Preserves existing query structure and logic
  3. Configurable field mapping for controlled adoption
  4. No performance impact for non-matching queries

Limitations

  1. Currently limited to match queries (not multi_match)
  2. Requires field mapping configuration
  3. Some match query attributes may not have equivalent features in neural sparse queries

Performance Considerations

  • The query transformation process is O(n) where n is the number of query nodes
  • No additional network calls or heavy computations during transformation
  • Minimal memory overhead as it only creates new objects for transformed queries

What alternatives have you considered?

Another option is to modify the existing ML Inference Search Request Processor (introduced in OpenSearch 2.16).

There are some limitations which makes ML inference processor not an existing viable solution.

  1. With query_template, the whole query body will be replaced to the template. So even if match query is part of a compound query, it's not match query but the whole compound query is replaced. And if query_template is not used, only field configured in output_map is replaced to model output, the query is still match query, and no semantic search will run.
  2. Although supporting wild card match in input_map through JsonPath's functionality, and multiple field values could be found, those values are only used for batch inferencing, and batch inferencing results are handled as a whole, there is no mechanism to restore them back to their original location in the query.

To address these limitation requires a dedicated design and non trivial work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

2 participants