Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled Filtering on Nested Vector fields with top level filters #1372

Merged
merged 1 commit into from
Jan 5, 2024

Conversation

navneet1v
Copy link
Collaborator

Description

Enabled Filtering on Nested Vector fields with top level filters.

After this change, below use case for filtering is supported with vector search(both Lucene and Faiss). The change is BWC.:

  1. When vector field is nested and filters are applied on non-nested fields.
  2. When vector field is nested and filters are applied nested fields.
  3. When vector field is nested and filters are applied on nested fields with nested path provided in filters.
  4. When vector field is non-nested and filters are applied on nested fields with nested path provided in filters.

To ensure that any further changes in the filter code we are able to catch the bugs, I added a new IT test class AdvancedFilteringUseCasesIT which has integration tests for all the above use-cases.

Context
There was feature gap in the efficient filter for both Faiss and Lucene engine, where the filtering was not working with vector search when vector field is a nested field and filter query has the top level fields.

Please refer #1356 for more details.

Issues Resolved

#1356

Testing

Apart from IT, I did the manual testing for all the below cases. Please check the #1356 for more details the query payloads.

Sr No. Case Type KNN engine KNN Field Meta data field KNN Query Filter Query Manual Testing Status
1 Base Case Faiss Nested Nested Nested Query contains the nested field, but filter query has no nested field context Working
2   Faiss Nested Nested Nested Nested Query clause wrapped in Bool query Working
3   Faiss Nested Nested Nested nested query clause Working
               
4 Reported which was not working Faiss Nested Non Nested Nested Non Nested Working
               
5   Faiss Non Nested Nested Non Nested nested query clause Working
6   Faiss Non Nested Nested Non Nested Nested Query clause wrapped in Bool query Working
               
               
7 Base Case Faiss Non Nested Non Nested Non Nested Non Nested Working
               
               
8 Base Case Lucene Nested Nested Nested Query contains the nested field, but filter query has no nested field context Working
9   Lucene Nested Nested Nested Nested Query clause wrapped in Bool query Working
10   Lucene Nested Nested Nested nested query clause Working
               
11 Reported which was not working Lucene Nested Non Nested Nested Non Nested Working
               
12   Lucene Non Nested Nested Non Nested nested query clause Working
13   Lucene Non Nested Nested Non Nested Nested Query clause wrapped in Bool query Working
               
14 Base Case Lucene Non Nested Non Nested Non Nested Non Nested Working

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5c24d99) 85.10% compared to head (9bf2e60) 85.16%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1372      +/-   ##
============================================
+ Coverage     85.10%   85.16%   +0.05%     
- Complexity     1254     1258       +4     
============================================
  Files           163      163              
  Lines          5104     5110       +6     
  Branches        477      479       +2     
============================================
+ Hits           4344     4352       +8     
+ Misses          554      553       -1     
+ Partials        206      205       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vibrantvarun
Copy link
Member

Build is failing.

@navneet1v
Copy link
Collaborator Author

Build is failing.

its from windows and it is an IT.

jmazanec15
jmazanec15 previously approved these changes Jan 5, 2024
@jmazanec15
Copy link
Member

@navneet1v need to run spotlessApply

@navneet1v
Copy link
Collaborator Author

@navneet1v need to run spotlessApply

some how got missed. will run it

@navneet1v navneet1v merged commit 271df52 into opensearch-project:main Jan 5, 2024
49 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 5, 2024
navneet1v added a commit that referenced this pull request Jan 5, 2024
navneet1v added a commit that referenced this pull request Jan 5, 2024
…) (#1377)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 271df52)

Co-authored-by: Navneet Verma <[email protected]>
@heemin32
Copy link
Collaborator

Seems following query does not work where both top field and nested field filter is applied when knn is defined inside nested field.

PUT /my-knn-index-1
{
	"settings": {
		"index": {
			"knn": true,
			"knn.algo_param.ef_search": 100
		}
	},
	"mappings": {
		"properties": {
			"nested_field": {
				"type": "nested",
				"properties": {
					"my_vector": {
						"type": "knn_vector",
						"dimension": 3,
						"method": {
							"name": "hnsw",
							"space_type": "l2",
							"engine": "faiss",
							"parameters": {
								"ef_construction": 128,
								"m": 24
							}
						}
					}
				}
			}
		}
	}
}

PUT /_bulk?refresh=true
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{"parking": false, "nested_field":[{"my_vector":[1,1,1], "component": "title"},{"my_vector":[2,2,2], "component": "body"},{"my_vector":[3,3,3], "component": "reference"}]}
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{"parking": true, "nested_field":[{"my_vector":[10,10,10], "component": "title"},{"my_vector":[20,20,20], "component": "body"},{"my_vector":[30,30,30], "component": "reference"}]}
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{"parking": true, "nested_field":[{"my_vector":[30,30,30], "component": "title"},{"my_vector":[50,50,50], "component": "body"},{"my_vector":[60,60,60], "component": "reference"}]}

GET /my-knn-index-1/_search
{
	"query": {
		"nested": {
			"path": "nested_field",
			"query": {
				"knn": {
					"nested_field.my_vector": {
						"vector": [
							30,
							30,
							30
						],
						"k": 3,
						"filter": {
							"bool": {
								"must": [
									{
										"term": {
											"nested_field.component": "title"
										}
									},
									{
										"term": {
											"parking": false
										}
									}
								]
							}
						}
					}
				}
			}
		}
	}
}
Response
{
	"took": 8,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 0,
			"relation": "eq"
		},
		"max_score": null,
		"hits": []
	}
}

@navneet1v
Copy link
Collaborator Author

navneet1v commented Jan 29, 2024

Seems following query does not work where both top field and nested field filter is applied when knn is defined inside nested field.

Block (33 lines)

PUT /my-knn-index-1
{
	"settings": {
		"index": {
			"knn": true,
			"knn.algo_param.ef_search": 100
		}
	},
	"mappings": {
		"properties": {
			"nested_field": {
				"type": "nested",
				"properties": {
					"my_vector": {
						"type": "knn_vector",
						"dimension": 3,
						"method": {
							"name": "hnsw",
							"space_type": "l2",
							"engine": "faiss",
							"parameters": {
								"ef_construction": 128,
								"m": 24
							}
						}
					}
				}
			}
		}
	}
}
PUT /_bulk?refresh=true
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{"parking": false, "nested_field":[{"my_vector":[1,1,1], "component": "title"},{"my_vector":[2,2,2], "component": "body"},{"my_vector":[3,3,3], "component": "reference"}]}
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{"parking": true, "nested_field":[{"my_vector":[10,10,10], "component": "title"},{"my_vector":[20,20,20], "component": "body"},{"my_vector":[30,30,30], "component": "reference"}]}
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{"parking": true, "nested_field":[{"my_vector":[30,30,30], "component": "title"},{"my_vector":[50,50,50], "component": "body"},{"my_vector":[60,60,60], "component": "reference"}]}

Block (37 lines)

GET /my-knn-index-1/_search
{
	"query": {
		"nested": {
			"path": "nested_field",
			"query": {
				"knn": {
					"nested_field.my_vector": {
						"vector": [
							30,
							30,
							30
						],
						"k": 3,
						"filter": {
							"bool": {
								"must": [
									{
										"term": {
											"nested_field.component": "title"
										}
									},
									{
										"term": {
											"parking": false
										}
									}
								]
							}
						}
					}
				}
			}
		}
	}
}

Block (20 lines)

Response
{
	"took": 8,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 0,
			"relation": "eq"
		},
		"max_score": null,
		"hits": []
	}
}

Your filter query clause is not correct. As it contains both nested and non nested fields, include the nested context in the filter query otherwise Opensearch has no way to identify that query has nested fields.

example

GET /my-knn-index-1/_search
{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "knn": {
          "nested_field.my_vector": {
            "vector": [
              30,
              30,
              30
            ],
            "k": 3,
            "filter": {
              "bool": {
                "must": [
                  {
                    "nested": {
                        "path": "nested_field",
                        "query": {
                            "term": {
                                "nested_field.component": "title"
                            }
                        }
                    }
                  },
                  {
                    "term": {
                      "parking": false
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications v2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants