-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix derived source for binary and byte vectors #2533
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.index.codec.derivedsource; | ||
|
||
import lombok.extern.log4j.Log4j2; | ||
import org.apache.lucene.index.FieldInfo; | ||
import org.apache.lucene.util.BytesRef; | ||
import org.opensearch.knn.common.FieldInfoExtractor; | ||
import org.opensearch.knn.index.VectorDataType; | ||
import org.opensearch.knn.index.mapper.KNNVectorFieldMapperUtil; | ||
import org.opensearch.knn.index.vectorvalues.KNNVectorValues; | ||
|
||
import java.io.IOException; | ||
|
||
@Log4j2 | ||
abstract class AbstractPerFieldDerivedVectorInjector implements PerFieldDerivedVectorInjector { | ||
/** | ||
* Utility method for formatting the vector values based on the vector data type. KNNVectorValues must be advanced | ||
* to the correct position. | ||
* | ||
* @param fieldInfo fieldinfo for the vector field | ||
* @param vectorValues vector values of the field. getVector or getConditionalVector should return expected vector. | ||
* @return vector formatted based on the vector data type | ||
* @throws IOException if unable to deserialize stored vector | ||
*/ | ||
protected Object formatVector(FieldInfo fieldInfo, KNNVectorValues<?> vectorValues) throws IOException { | ||
Object vectorValue = vectorValues.getVector(); | ||
// If the vector value is a byte[], we must deserialize | ||
if (vectorValue instanceof byte[]) { | ||
BytesRef vectorBytesRef = new BytesRef((byte[]) vectorValue); | ||
VectorDataType vectorDataType = FieldInfoExtractor.extractVectorDataType(fieldInfo); | ||
return KNNVectorFieldMapperUtil.deserializeStoredVector(vectorBytesRef, vectorDataType); | ||
Comment on lines
+34
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate why we need to do this? I am trying to understand this like why we need it, since we already have byte[] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, this is what the IT I added looks like before:
Basically, source is expected to be an int array, but because we are adding a byte array, it gets serialized as a byte string |
||
} | ||
return vectorValues.conditionalCloneVector(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the datatype of the field here, rather than instance of check on byte[].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a byte[] in order to deserialize, so this check is required. In terms of displaying, deserializeStoredVector takes the vectorDataType, so we can be sure that it will format it properly.