Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw proper exception to invalid k-NN query #1380

Merged
merged 5 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
* Increase Lucene max dimension limit to 16,000 [#1346](https://github.com/opensearch-project/k-NN/pull/1346)
* Tuned default values for ef_search and ef_construction for better indexing and search performance for vector search [#1353](https://github.com/opensearch-project/k-NN/pull/1353)
* Enabled Filtering on Nested Vector fields with top level filters [#1372](https://github.com/opensearch-project/k-NN/pull/1372)
* Throw proper exception to invalid k-NN query [#1380](https://github.com/opensearch-project/k-NN/pull/1380)
### Bug Fixes
* Fix use-after-free case on nmslib search path [#1305](https://github.com/opensearch-project/k-NN/pull/1305)
* Allow nested knn field mapping when train model [#1318](https://github.com/opensearch-project/k-NN/pull/1318)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,14 @@ public static void initialize(ModelDao modelDao) {
}

private static float[] ObjectsToFloats(List<Object> objs) {
if (Objects.isNull(objs) || objs.isEmpty()) {
throw new IllegalArgumentException(String.format("[%s] field 'vector' requires to be non-null and non-empty", NAME));
}
float[] vec = new float[objs.size()];
for (int i = 0; i < objs.size(); i++) {
if ((objs.get(i) instanceof Number) == false) {
throw new IllegalArgumentException(String.format("[%s] field 'vector' requires to be an array of numbers", NAME));
}
vec[i] = ((Number) objs.get(i)).floatValue();
}
return vec;
Expand Down
51 changes: 51 additions & 0 deletions src/test/java/org/opensearch/knn/index/VectorDataTypeIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import org.opensearch.core.rest.RestStatus;
import org.opensearch.script.Script;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
Expand Down Expand Up @@ -425,6 +426,56 @@ public void testKNNScriptScoreWithInvalidByteQueryVector() throws Exception {
);
}

@SneakyThrows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the IT tests to be minimal by not including error cases.
If the test can be covered in unit test KNNQueryBuilderTests, can we remove tests in VectorDataTypeIT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say its better have the IT and unit tests. Both have their use. So, I would like to keep both of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the IT cases here, it's still a potential behavior from customers, it's good to have the IT to verify from end to end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the case where unit test fails to catch but IT can catch here?
If the issue can be caught using simple unit test, why do we want to add expensive/duplicated IT test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heemin32 The unit test cannot verify the return response code which is expected to be 400(BAD_REQUEST), It can be safer to make sure the right response code is returned to end user by IT in k-NN plugin level. Ref: https://github.com/opensearch-project/OpenSearch/blob/2de44a7b771c9b8b59f57069d0fdfdf9ee818ec2/libs/core/src/main/java/org/opensearch/ExceptionsHelper.java#L99-L100

Copy link
Collaborator

@navneet1v navneet1v Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heemin32

What is the case where unit test fails to catch but IT can catch here?

The case where someone wraps another exception after k-NN has thrown the excpetion.

The responsibility of return code is not in your code but in the OpenSearch framework. All you need to do is throwing correct exception.

Yes responsibility lies with Opensearch to send the correct status code, but we are relying on a behavior of Opensearch and as this exception is thrown from k-NN side we need to make sure that our customers are getting right status code. Otherwise we could have created a new exception of our own.

A simple understanding of adding an IT here is, now on the rest layer a different response will be returned hence we need to make sure that same response code is received by customer what we are expecting it to receive, and in this case that is 400.

Also, having more tests is always better. I am not able to understand why there is harm is having an IT which tests negative scenarios for k-NN.

Copy link
Collaborator

@heemin32 heemin32 Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where someone wraps another exception after k-NN has thrown the excpetion.

  1. The case is rare. 2. If this is high concern, we need a better mechanism to prevent it rather than relying on integration test.

Having more tests is always better.

Yes. However, it is not free but it comes with costs: longer test time, engineering efforts on implementation and maintenance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is high concern, we need a better mechanism to prevent it rather than relying on integration test.

as now integration test is the best way. Another thing is the change which is being done in this PR is changing the RestStatus code, so having a IT is must for cases like this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think OS core will do anything to change the behavior on existing exception. If this is concern we need to have our own exception and code can be verified using unit test.

If the concern is someone wrap exception inside knn repo, we can also validate it by writing a unit test on the most outer method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all of you on the feedbacks. I've kept the integration tests. They offer a broader check against system-wide issues that unit tests might miss. We'll strive for a balance between comprehensive testing and efficiency.

public void testSearchWithInvalidSearchVectorType() {
createKnnIndexMappingWithLuceneEngine(2, SpaceType.L2, VectorDataType.FLOAT.getValue());
ingestL2FloatTestData();
Request request = new Request("POST", String.format("/%s/_search", INDEX_NAME));
List<Object> invalidTypeQueryVector = new ArrayList<>();
invalidTypeQueryVector.add(1.5);
invalidTypeQueryVector.add(2.5);
invalidTypeQueryVector.add("a");
invalidTypeQueryVector.add(null);
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.startObject("query")
.startObject("knn")
.startObject(FIELD_NAME)
.field("vector", invalidTypeQueryVector)
.field("k", 4)
.endObject()
.endObject()
.endObject()
.endObject();
request.setJsonEntity(builder.toString());

ResponseException ex = expectThrows(ResponseException.class, () -> client().performRequest(request));
assertEquals(400, ex.getResponse().getStatusLine().getStatusCode());
assertTrue(ex.getMessage().contains("[knn] field 'vector' requires to be an array of numbers"));
}

@SneakyThrows
public void testSearchWithMissingQueryVector() {
createKnnIndexMappingWithLuceneEngine(2, SpaceType.L2, VectorDataType.FLOAT.getValue());
ingestL2FloatTestData();
Request request = new Request("POST", String.format("/%s/_search", INDEX_NAME));
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.startObject("query")
.startObject("knn")
.startObject(FIELD_NAME)
.field("k", 4)
.endObject()
.endObject()
.endObject()
.endObject();
request.setJsonEntity(builder.toString());

ResponseException ex = expectThrows(ResponseException.class, () -> client().performRequest(request));
assertEquals(400, ex.getResponse().getStatusLine().getStatusCode());
assertTrue(ex.getMessage().contains("[knn] field 'vector' requires to be non-null and non-empty"));
}

@SneakyThrows
private void ingestL2ByteTestData() {
Byte[] b1 = { 6, 6 };
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
import org.opensearch.plugins.SearchPlugin;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;

Expand Down Expand Up @@ -127,6 +128,70 @@ public void testFromXcontent_WithFilter() throws Exception {
actualBuilder.equals(knnQueryBuilder);
}

public void testFromXContent_invalidQueryVectorType() throws Exception {
final ClusterService clusterService = mockClusterService(Version.CURRENT);

final KNNClusterUtil knnClusterUtil = KNNClusterUtil.instance();
knnClusterUtil.initialize(clusterService);

List<Object> invalidTypeQueryVector = new ArrayList<>();
invalidTypeQueryVector.add(1.5);
invalidTypeQueryVector.add(2.5);
invalidTypeQueryVector.add("a");
invalidTypeQueryVector.add(null);

XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
builder.startObject(FIELD_NAME);
builder.field(KNNQueryBuilder.VECTOR_FIELD.getPreferredName(), invalidTypeQueryVector);
builder.field(KNNQueryBuilder.K_FIELD.getPreferredName(), K);
builder.endObject();
builder.endObject();
XContentParser contentParser = createParser(builder);
contentParser.nextToken();
IllegalArgumentException exception = expectThrows(
IllegalArgumentException.class,
() -> KNNQueryBuilder.fromXContent(contentParser)
);
assertTrue(exception.getMessage().contains("[knn] field 'vector' requires to be an array of numbers"));
}

public void testFromXContent_missingQueryVector() throws Exception {
final ClusterService clusterService = mockClusterService(Version.CURRENT);

final KNNClusterUtil knnClusterUtil = KNNClusterUtil.instance();
knnClusterUtil.initialize(clusterService);

// Test without vector field
XContentBuilder builderWithoutVectorField = XContentFactory.jsonBuilder();
builderWithoutVectorField.startObject();
builderWithoutVectorField.startObject(FIELD_NAME);
builderWithoutVectorField.field(KNNQueryBuilder.K_FIELD.getPreferredName(), K);
builderWithoutVectorField.endObject();
builderWithoutVectorField.endObject();
XContentParser contentParserWithoutVectorField = createParser(builderWithoutVectorField);
contentParserWithoutVectorField.nextToken();
IllegalArgumentException exception = expectThrows(
IllegalArgumentException.class,
() -> KNNQueryBuilder.fromXContent(contentParserWithoutVectorField)
);
assertTrue(exception.getMessage().contains("[knn] field 'vector' requires to be non-null and non-empty"));

// Test empty vector field
List<Object> emptyQueryVector = new ArrayList<>();
XContentBuilder builderWithEmptyVector = XContentFactory.jsonBuilder();
builderWithEmptyVector.startObject();
builderWithEmptyVector.startObject(FIELD_NAME);
builderWithEmptyVector.field(KNNQueryBuilder.VECTOR_FIELD.getPreferredName(), emptyQueryVector);
builderWithEmptyVector.field(KNNQueryBuilder.K_FIELD.getPreferredName(), K);
builderWithEmptyVector.endObject();
builderWithEmptyVector.endObject();
XContentParser contentParserWithEmptyVector = createParser(builderWithEmptyVector);
contentParserWithEmptyVector.nextToken();
exception = expectThrows(IllegalArgumentException.class, () -> KNNQueryBuilder.fromXContent(contentParserWithEmptyVector));
assertTrue(exception.getMessage().contains("[knn] field 'vector' requires to be non-null and non-empty"));
}

@Override
protected NamedXContentRegistry xContentRegistry() {
List<NamedXContentRegistry.Entry> list = ClusterModule.getNamedXWriteables();
Expand Down
Loading