Fix derived source for binary and byte vectors #2533

jmazanec15 · 2025-02-18T16:51:55Z

Description

For binary and byte vectors, for derived source, we were not formatting them before adding them back to the source. Thus, they were binary strings in the source. This change fixes this formatting to format them as ints before adding back.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

For binary and byte vectors, for derived source, we were not formatting them before adding them back to the source. Thus, they were binary strings in the source. This change fixes this formatting to format them as ints before adding back. Signed-off-by: John Mazanec <[email protected]>

Vikasht34

LGTM

navneet1v · 2025-02-18T19:05:36Z

...java/org/opensearch/knn/index/codec/derivedsource/AbstractPerFieldDerivedVectorInjector.java

+    protected Object formatVector(FieldInfo fieldInfo, KNNVectorValues<?> vectorValues) throws IOException {
+        Object vectorValue = vectorValues.getVector();
+        // If the vector value is a byte[], we must deserialize
+        if (vectorValue instanceof byte[]) {


can we use the datatype of the field here, rather than instance of check on byte[].

We need a byte[] in order to deserialize, so this check is required. In terms of displaying, deserializeStoredVector takes the vectorDataType, so we can be sure that it will format it properly.

navneet1v · 2025-02-18T19:10:17Z

...java/org/opensearch/knn/index/codec/derivedsource/AbstractPerFieldDerivedVectorInjector.java

+            VectorDataType vectorDataType = FieldInfoExtractor.extractVectorDataType(fieldInfo);
+            return KNNVectorFieldMapperUtil.deserializeStoredVector(vectorBytesRef, vectorDataType);


Can you elaborate why we need to do this? I am trying to understand this like why we need it, since we already have byte[]

Sure, this is what the IT I added looks like before:

2> java.lang.AssertionError: Docs do not match: 1 expected:<{test_vector=[115, -43, 26, -69, -40, -100, -72, 25, 111, 14, -5, 104, -110, -7, 77, 104]}> but was:<{test_vector=c9Uau9icuBlvDvtokvlNaA==}>

Basically, source is expected to be an int array, but because we are adding a byte array, it gets serialized as a byte string

jmazanec15 requested review from heemin32, navneet1v, VijayanB, vamshin, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan, luyuncheng, shatejas, 0ctopus13prime and Vikasht34 as code owners February 18, 2025 16:51

jmazanec15 added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch backport 2.x labels Feb 18, 2025

jmazanec15 force-pushed the derived-byte-fix branch from 63cf35c to ed8a06e Compare February 18, 2025 16:53

jmazanec15 added backport 2.x and removed backport 2.x labels Feb 18, 2025

Vikasht34 approved these changes Feb 18, 2025

View reviewed changes

navneet1v reviewed Feb 18, 2025

View reviewed changes

navneet1v approved these changes Feb 18, 2025

View reviewed changes

jmazanec15 merged commit 0df5f62 into opensearch-project:main Feb 18, 2025
39 checks passed

opensearch-trigger-bot bot mentioned this pull request Feb 18, 2025

[Backport 2.x] Fix derived source for binary and byte vectors #2535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix derived source for binary and byte vectors #2533

Fix derived source for binary and byte vectors #2533

jmazanec15 commented Feb 18, 2025

Vikasht34 left a comment

navneet1v Feb 18, 2025

jmazanec15 Feb 18, 2025

navneet1v Feb 18, 2025

jmazanec15 Feb 18, 2025

		VectorDataType vectorDataType = FieldInfoExtractor.extractVectorDataType(fieldInfo);
		return KNNVectorFieldMapperUtil.deserializeStoredVector(vectorBytesRef, vectorDataType);

Fix derived source for binary and byte vectors #2533

Fix derived source for binary and byte vectors #2533

Conversation

jmazanec15 commented Feb 18, 2025

Description

Related Issues

Check List

Vikasht34 left a comment

Choose a reason for hiding this comment

navneet1v Feb 18, 2025

Choose a reason for hiding this comment

jmazanec15 Feb 18, 2025

Choose a reason for hiding this comment

navneet1v Feb 18, 2025

Choose a reason for hiding this comment

jmazanec15 Feb 18, 2025

Choose a reason for hiding this comment