Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix script score queries not getting cached #1367

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

junqiu-lei
Copy link
Member

@junqiu-lei junqiu-lei commented Jan 2, 2024

Description

This PR fixs the issue when trying perform a Script Score query in k-NN with request_cache=true not work as expected.

Details on issue debug: #1098 (comment)

Issues Resolved

Closes #1098

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@junqiu-lei can you add the details on how you tested that script queries are getting cached?

Copy link

codecov bot commented Jan 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (391a2ef) 85.04% compared to head (d439b7f) 85.05%.

❗ Current head d439b7f differs from pull request most recent head 659824b. Consider uploading reports for the commit 659824b to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1367   +/-   ##
=========================================
  Coverage     85.04%   85.05%           
- Complexity     1251     1254    +3     
=========================================
  Files           162      163    +1     
  Lines          5101     5104    +3     
  Branches        477      477           
=========================================
+ Hits           4338     4341    +3     
  Misses          557      557           
  Partials        206      206           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@junqiu-lei
Copy link
Member Author

@junqiu-lei can you add the details on how you tested that script queries are getting cached?

@navneet1v We can check the index request_cache stat, finally it will be hit if we enable the request cache. The test steps can be found below:

// 1. Create knn index
PUT http://localhost:9200/my-knn-index-1
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
        "my_vector1": {
          "type": "knn_vector",
          "dimension": 2,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 128,
              "m": 24
            }
          }
        },
        "my_vector2": {
          "type": "knn_vector",
          "dimension": 4,
          "method": {
            "name": "hnsw",
            "space_type": "innerproduct",
            "engine": "faiss",
            "parameters": {
              "ef_construction": 256,
              "m": 48
            }
          }
        }
    }
  }
}

// 2. Ingest data
POST http://localhost:9200/_bulk
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-knn-index-1", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-knn-index-1", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-knn-index-1", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-knn-index-1", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-knn-index-1", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }


// 3. First time index search with request_cache=true
POST http://localhost:9200/my-knn-index-1/_search?request_cache=true
{
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

// 4. First time index stats check
GET http://localhost:9200/my-knn-index-1/_stats
...
"indices": {
...
  "my-knn-index-1": {
...
    "total": {
      "request_cache": {
        "memory_size_in_bytes": 788,
        "evictions": 0,
        "hit_count": 0,
        "miss_count": 1
      }
   }
  }
}
...

// 5. Second time index search with request_cache=true
POST http://localhost:9200/my-knn-index-1/_search?request_cache=true
{
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

// 6. Second time index stats check
GET http://localhost:9200/my-knn-index-1/_stats
...
"indices": {
...
  "my-knn-index-1": {
...
    "total": {
      "request_cache": {
        "memory_size_in_bytes": 788,
        "evictions": 0,
        "hit_count": 1,
        "miss_count": 1
      }
   }
  }
}
...

@navneet1v
Copy link
Collaborator

navneet1v commented Jan 2, 2024

@junqiu-lei can you add the details on how you tested that script queries are getting cached?

@navneet1v We can check the index request_cache stat, finally it will be hit if we enable the request cache. The test steps can be found below:

Block (143 lines)

// 1. Create knn index
PUT http://localhost:9200/my-knn-index-1
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
        "my_vector1": {
          "type": "knn_vector",
          "dimension": 2,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 128,
              "m": 24
            }
          }
        },
        "my_vector2": {
          "type": "knn_vector",
          "dimension": 4,
          "method": {
            "name": "hnsw",
            "space_type": "innerproduct",
            "engine": "faiss",
            "parameters": {
              "ef_construction": 256,
              "m": 48
            }
          }
        }
    }
  }
}

// 2. Ingest data
POST http://localhost:9200/_bulk
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-knn-index-1", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-knn-index-1", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-knn-index-1", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-knn-index-1", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-knn-index-1", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }


// 3. First time index search with request_cache=true
POST http://localhost:9200/my-knn-index-1/_search?request_cache=true
{
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

// 4. First time index stats check
GET http://localhost:9200/my-knn-index-1/_stats
...
"indices": {
...
  "my-knn-index-1": {
...
    "total": {
      "request_cache": {
        "memory_size_in_bytes": 788,
        "evictions": 0,
        "hit_count": 0,
        "miss_count": 1
      }
   }
  }
}
...

// 5. Second time index search with request_cache=true
POST http://localhost:9200/my-knn-index-1/_search?request_cache=true
{
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

// 6. Second time index stats check
GET http://localhost:9200/my-knn-index-1/_stats
...
"indices": {
...
  "my-knn-index-1": {
...
    "total": {
      "request_cache": {
        "memory_size_in_bytes": 788,
        "evictions": 0,
        "hit_count": 1,
        "miss_count": 1
      }
   }
  }
}
...

can we use the index stats to validate in the IT that cache is getting hit, this will ensure that in future we are able to catch any issue if the request suddenly stopped caching.

@junqiu-lei
Copy link
Member Author

junqiu-lei commented Jan 2, 2024

@junqiu-lei can you add the details on how you tested that script queries are getting cached?

@navneet1v We can check the index request_cache stat, finally it will be hit if we enable the request cache.

can we use the index stats to validate in the IT that cache is getting hit.

Yes, the new IT which use index stats to validate is already included in this PR.

@junqiu-lei junqiu-lei self-assigned this Jan 3, 2024
naveentatikonda
naveentatikonda previously approved these changes Jan 3, 2024
naveentatikonda
naveentatikonda previously approved these changes Jan 3, 2024
@@ -39,7 +39,7 @@ public <FactoryType> FactoryType compile(String name, String code, ScriptContext
KNNCounter.SCRIPT_COMPILATION_ERRORS.increment();
throw new IllegalArgumentException("Unknown script name " + code);
}
ScoreScript.Factory factory = KNNScoreScriptFactory::new;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need, since KNNScoreScriptFactory does not have a no-argument constructor that matches what's expected by the interface, we can't use the method reference KNNScoreScriptFactory::new here. And the IDE
will report error "Cannot resolve constructor 'KNNScoreScriptFactory'"

@junqiu-lei junqiu-lei merged commit 2dc3f61 into opensearch-project:main Jan 3, 2024
48 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1367-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2dc3f613451e6c36fc41a2988081cf5c76490285
# Push it to GitHub
git push --set-upstream origin backport/backport-1367-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1367-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x bug Something isn't working v2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Script Score Queries Not getting Cached for K-NN painless Script for Exact Search
5 participants