Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
Signed-off-by: Vikasht34 <[email protected]>
  • Loading branch information
Vikasht34 authored Feb 12, 2025
2 parents 68ac180 + 349a715 commit 01219ae
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 49 deletions.
37 changes: 2 additions & 35 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,47 +16,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
* Update package name to fix compilation issue [#2513](https://github.com/opensearch-project/k-NN/pull/2513)
### Refactoring

## [Unreleased 2.x](https://github.com/opensearch-project/k-NN/compare/2.18...2.x)
## [Unreleased 2.x](https://github.com/opensearch-project/k-NN/compare/2.19...2.x)
### Features
- Add Support for Multi Values in innerHit for Nested k-NN Fields in Lucene and FAISS (#2283)[https://github.com/opensearch-project/k-NN/pull/2283]
- Add binary index support for Lucene engine. (#2292)[https://github.com/opensearch-project/k-NN/pull/2292]
- Add expand_nested_docs Parameter support to NMSLIB engine (#2331)[https://github.com/opensearch-project/k-NN/pull/2331]
- Add a new build mode, `FAISS_OPT_LEVEL=avx512_spr`, which enables the use of advanced AVX-512 instructions introduced with Intel(R) Sapphire Rapids (#2404)[https://github.com/opensearch-project/k-NN/pull/2404]
- Add cosine similarity support for faiss engine (#2376)[https://github.com/opensearch-project/k-NN/pull/2376]
- Add derived source feature for vector fields (#2449)[https://github.com/opensearch-project/k-NN/pull/2449]
### Enhancements
- Introduced a writing layer in native engines where relies on the writing interface to process IO. (#2241)[https://github.com/opensearch-project/k-NN/pull/2241]
- Allow method parameter override for training based indices (#2290) https://github.com/opensearch-project/k-NN/pull/2290]
- Optimizes lucene query execution to prevent unnecessary rewrites (#2305)[https://github.com/opensearch-project/k-NN/pull/2305]
- Add check to directly use ANN Search when filters match all docs. (#2320)[https://github.com/opensearch-project/k-NN/pull/2320]
- Use one formula to calculate cosine similarity (#2357)[https://github.com/opensearch-project/k-NN/pull/2357]
- Add WithFieldName implementation to KNNQueryBuilder (#2398)[https://github.com/opensearch-project/k-NN/pull/2398]
- Make the build work for M series MacOS without manual code changes and local JAVA_HOME config (#2397)[https://github.com/opensearch-project/k-NN/pull/2397]
- Enabled concurrent graph creation for Lucene engine with index thread qty settings(#2480)[https://github.com/opensearch-project/k-NN/pull/2480]
- Remove DocsWithFieldSet reference from NativeEngineFieldVectorsWriter (#2408)[https://github.com/opensearch-project/k-NN/pull/2408]
### Bug Fixes
* Fixing the bug when a segment has no vector field present for disk based vector search (#2282)[https://github.com/opensearch-project/k-NN/pull/2282]
* Fixing the bug where search fails with "fields" parameter for an index with a knn_vector field (#2314)[https://github.com/opensearch-project/k-NN/pull/2314]
* Fix for NPE while merging segments after all the vector fields docs are deleted (#2365)[https://github.com/opensearch-project/k-NN/pull/2365]
* Allow validation for non knn index only after 2.17.0 (#2315)[https://github.com/opensearch-project/k-NN/pull/2315]
* Fixing the bug to prevent updating the index.knn setting after index creation(#2348)[https://github.com/opensearch-project/k-NN/pull/2348]
* Release query vector memory after execution (#2346)[https://github.com/opensearch-project/k-NN/pull/2346]
* Fix shard level rescoring disabled setting flag (#2352)[https://github.com/opensearch-project/k-NN/pull/2352]
* Fix filter rewrite logic which was resulting in getting inconsistent / incorrect results for cases where filter was getting rewritten for shards (#2359)[https://github.com/opensearch-project/k-NN/pull/2359]
* Fixing it to retrieve space_type from index setting when both method and top level don't have the value. [#2374](https://github.com/opensearch-project/k-NN/pull/2374)
* Fixing the bug where setting rescore as false for on_disk knn_vector query is a no-op (#2399)[https://github.com/opensearch-project/k-NN/pull/2399]
* Fixing bug where mapping accepts both dimension and model-id (#2410)[https://github.com/opensearch-project/k-NN/pull/2410]
### Infrastructure
* Updated C++ version in JNI from c++11 to c++17 [#2259](https://github.com/opensearch-project/k-NN/pull/2259)
* Upgrade bytebuddy and objenesis version to match OpenSearch core and, update github ci runner for macos [#2279](https://github.com/opensearch-project/k-NN/pull/2279)
### Documentation
### Maintenance
* Select index settings based on cluster version[2236](https://github.com/opensearch-project/k-NN/pull/2236)
* Added periodic cache maintenance for QuantizationStateCache and NativeMemoryCache [#2308](https://github.com/opensearch-project/k-NN/pull/2308)
* Added null checks for fieldInfo in ExactSearcher to avoid NPE while running exact search for segments with no vector field (#2278)[https://github.com/opensearch-project/k-NN/pull/2278]
* Added Lucene BWC tests (#2313)[https://github.com/opensearch-project/k-NN/pull/2313]
* Upgrade jsonpath from 2.8.0 to 2.9.0[2325](https://github.com/opensearch-project/k-NN/pull/2325)
* Bump Faiss commit from 1f42e81 to 0cbc2a8 to accelerate hamming distance calculation using _mm512_popcnt_epi64 intrinsic and also add avx512-fp16 instructions to boost performance [#2381](https://github.com/opensearch-project/k-NN/pull/2381)
* Enabled indices.breaker.total.use_real_memory setting via build.gradle for integTest Cluster to catch heap CB in local ITs and github CI actions [#2395](https://github.com/opensearch-project/k-NN/pull/2395/)
* Fixing Lucene912Codec Issue with BWC for Lucene 10.0.1 upgrade[#2429](https://github.com/opensearch-project/k-NN/pull/2429)
* Enabled idempotency of local builds when using `./gradlew clean` and nest `jni/release` directory under `jni/build` for easier cleanup [#2516](https://github.com/opensearch-project/k-NN/pull/2516)
### Refactoring
53 changes: 53 additions & 0 deletions release-notes/opensearch-knn.release-notes-2.19.0.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
## Version 2.19.0.0 Release Notes

Compatible with OpenSearch 2.19.0

### Features
- Add Support for Multi Values in innerHit for Nested k-NN Fields in Lucene and FAISS [#2283](https://github.com/opensearch-project/k-NN/pull/2283)
- Add binary index support for Lucene engine. [#2292](https://github.com/opensearch-project/k-NN/pull/2292)
- Add expand_nested_docs Parameter support to NMSLIB engine [#2331](https://github.com/opensearch-project/k-NN/pull/2331)
- Add a new build mode, `FAISS_OPT_LEVEL=avx512_spr`, which enables the use of advanced AVX-512 instructions introduced with Intel[R] Sapphire Rapids [#2404](https://github.com/opensearch-project/k-NN/pull/2404)
- Add cosine similarity support for faiss engine [#2376](https://github.com/opensearch-project/k-NN/pull/2376)
- Add concurrency optimizations with native memory graph loading and force eviction [#2265](https://github.com/opensearch-project/k-NN/pull/2345)
- Add derived source feature for vector fields [#2449](https://github.com/opensearch-project/k-NN/pull/2449)
### Enhancements
- Introduced a writing layer in native engines where relies on the writing interface to process IO. [#2241](https://github.com/opensearch-project/k-NN/pull/2241)
- Allow method parameter override for training based indices [#2290](https://github.com/opensearch-project/k-NN/pull/2290)
- Optimizes lucene query execution to prevent unnecessary rewrites [#2305](https://github.com/opensearch-project/k-NN/pull/2305)
- Added more detailed error messages for KNN model training [#2378](https://github.com/opensearch-project/k-NN/pull/2378)
- Add check to directly use ANN Search when filters match all docs. [#2320](https://github.com/opensearch-project/k-NN/pull/2320)
- Use one formula to calculate cosine similarity [#2357](https://github.com/opensearch-project/k-NN/pull/2357)
- Make the build work for M series MacOS without manual code changes and local JAVA_HOME config [#2397](https://github.com/opensearch-project/k-NN/pull/2397)
- Remove DocsWithFieldSet reference from NativeEngineFieldVectorsWriter [#2408](https://github.com/opensearch-project/k-NN/pull/2408)
- Remove skip building graph check for quantization use case [#2430](https://github.com/opensearch-project/k-NN/pull/2430)
- Removing redundant type conversions for script scoring for hamming space with binary vectors [#2351](https://github.com/opensearch-project/k-NN/pull/2351)
- Update default to 0 to always build graph as default behavior [#2452](https://github.com/opensearch-project/k-NN/pull/2452)
- Enabled concurrent graph creation for Lucene engine with index thread qty settings[#2480](https://github.com/opensearch-project/k-NN/pull/2480)
### Bug Fixes
* Fixing the bug when a segment has no vector field present for disk based vector search [#2282](https://github.com/opensearch-project/k-NN/pull/2282)
* Fixing the bug where search fails with "fields" parameter for an index with a knn_vector field [#2314](https://github.com/opensearch-project/k-NN/pull/2314)
* Fix for NPE while merging segments after all the vector fields docs are deleted [#2365](https://github.com/opensearch-project/k-NN/pull/2365)
* Allow validation for non knn index only after 2.17.0 [#2315](https://github.com/opensearch-project/k-NN/pull/2315)
* Fixing the bug to prevent updating the index.knn setting after index creation[#2348](https://github.com/opensearch-project/k-NN/pull/2348)
* Release query vector memory after execution [#2346](https://github.com/opensearch-project/k-NN/pull/2346)
* Fix shard level rescoring disabled setting flag [#2352](https://github.com/opensearch-project/k-NN/pull/2352)
* Fix filter rewrite logic which was resulting in getting inconsistent / incorrect results for cases where filter was getting rewritten for shards [#2359](https://github.com/opensearch-project/k-NN/pull/2359)
* Fixing it to retrieve space_type from index setting when both method and top level don't have the value. [#2374](https://github.com/opensearch-project/k-NN/pull/2374)
* Fixing the bug where setting rescore as false for on_disk knn_vector query is a no-op [#2399](https://github.com/opensearch-project/k-NN/pull/2399)
* Fixing the bug to prevent index.knn setting from being modified or removed on restore snapshot [#2445](https://github.com/opensearch-project/k-NN/pull/2445)
* Fix Faiss byte vector efficient filter bug [#2448](https://github.com/opensearch-project/k-NN/pull/2448)
* Fixing bug where mapping accepts both dimension and model-id [#2410](https://github.com/opensearch-project/k-NN/pull/2410)
* Add version check for full field name validation [#2477](https://github.com/opensearch-project/k-NN/pull/2477)
### Infrastructure
* Updated C++ version in JNI from c++11 to c++17 [#2259](https://github.com/opensearch-project/k-NN/pull/2259)
* Upgrade bytebuddy and objenesis version to match OpenSearch core and, update github ci runner for macos [#2279](https://github.com/opensearch-project/k-NN/pull/2279)
### Documentation
### Maintenance
* Select index settings based on cluster version[2236](https://github.com/opensearch-project/k-NN/pull/2236)
* Added periodic cache maintenance for QuantizationStateCache and NativeMemoryCache [#2308](https://github.com/opensearch-project/k-NN/pull/2308)
* Added null checks for fieldInfo in ExactSearcher to avoid NPE while running exact search for segments with no vector field [#2278](https://github.com/opensearch-project/k-NN/pull/2278)
* Added Lucene BWC tests [#2313](https://github.com/opensearch-project/k-NN/pull/2313)
* Upgrade jsonpath from 2.8.0 to 2.9.0[2325](https://github.com/opensearch-project/k-NN/pull/2325)
* Bump Faiss commit from 1f42e81 to 0cbc2a8 to accelerate hamming distance calculation using _mm512_popcnt_epi64 intrinsic and also add avx512-fp16 instructions to boost performance [#2381](https://github.com/opensearch-project/k-NN/pull/2381)
* Deprecate nmslib engine [#2427](https://github.com/opensearch-project/k-NN/pull/2427)
* Add spotless mirror repo for fixing builds [#2453](https://github.com/opensearch-project/k-NN/pull/2453)
2 changes: 1 addition & 1 deletion src/main/java/org/opensearch/knn/index/KNNSettings.java
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ public class KNNSettings {
public static final boolean KNN_DEFAULT_FAISS_AVX512_DISABLED_VALUE = false;
public static final boolean KNN_DEFAULT_FAISS_AVX512_SPR_DISABLED_VALUE = false;
public static final String INDEX_KNN_DEFAULT_SPACE_TYPE = "l2";
public static final Integer INDEX_KNN_ADVANCED_APPROXIMATE_THRESHOLD_DEFAULT_VALUE = 15_000;
public static final Integer INDEX_KNN_ADVANCED_APPROXIMATE_THRESHOLD_DEFAULT_VALUE = 0;
public static final Integer INDEX_KNN_BUILD_VECTOR_DATA_STRUCTURE_THRESHOLD_MIN = -1;
public static final Integer INDEX_KNN_BUILD_VECTOR_DATA_STRUCTURE_THRESHOLD_MAX = Integer.MAX_VALUE - 2;
public static final String INDEX_KNN_DEFAULT_SPACE_TYPE_FOR_BINARY = "hamming";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,8 @@ public void flush(int maxDoc, final Sorter.DocMap sortMap) throws IOException {
field.getVectors()
);
final QuantizationState quantizationState = train(field.getFieldInfo(), knnVectorValuesSupplier, totalLiveDocs);
// Check only after quantization state writer finish writing its state, since it is required
// even if there are no graph files in segment, which will be later used by exact search
if (shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
// should skip graph building only for non quantization use case and if threshold is met
if (quantizationState == null && shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
log.info(
"Skip building vector data structure for field: {}, as liveDoc: {} is less than the threshold {} during flush",
fieldInfo.name,
Expand Down Expand Up @@ -144,9 +143,8 @@ public void mergeOneField(final FieldInfo fieldInfo, final MergeState mergeState
}

final QuantizationState quantizationState = train(fieldInfo, knnVectorValuesSupplier, totalLiveDocs);
// Check only after quantization state writer finish writing its state, since it is required
// even if there are no graph files in segment, which will be later used by exact search
if (shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
// should skip graph building only for non quantization use case and if threshold is met
if (quantizationState == null && shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
log.info(
"Skip building vector data structure for field: {}, as liveDoc: {} is less than the threshold {} during merge",
fieldInfo.name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -633,8 +633,7 @@ public void testFlush_whenThresholdIsEqualToFixedValue_thenRelevantNativeIndexWr
}
}

public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNotMet_thenSkipBuildingGraph()
throws IOException {
public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNotMet_thenStillBuildGraph() throws IOException {
// Given
List<KNNVectorValues<float[]>> expectedVectorValues = new ArrayList<>();
final Map<Integer, Integer> sizeMap = new HashMap<>();
Expand Down Expand Up @@ -717,7 +716,6 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
} else {
assertEquals(0, knn990QuantWriterMockedConstruction.constructed().size());
}
verifyNoInteractions(nativeIndexWriter);
IntStream.range(0, vectorsPerField.size()).forEach(i -> {
try {
if (vectorsPerField.get(i).isEmpty()) {
Expand All @@ -732,12 +730,12 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
final Long expectedTimesGetVectorValuesIsCalled = vectorsPerField.stream().filter(Predicate.not(Map::isEmpty)).count();
knnVectorValuesFactoryMockedStatic.verify(
() -> KNNVectorValuesFactory.getVectorValues(any(VectorDataType.class), any(DocsWithFieldSet.class), any()),
times(0)
times(Math.toIntExact(expectedTimesGetVectorValuesIsCalled) * 2)
);
}
}

public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNegative_thenSkipBuildingGraph()
public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNegative_thenStillBuildGraph()
throws IOException {
// Given
List<KNNVectorValues<float[]>> expectedVectorValues = new ArrayList<>();
Expand Down Expand Up @@ -820,7 +818,6 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
} else {
assertEquals(0, knn990QuantWriterMockedConstruction.constructed().size());
}
verifyNoInteractions(nativeIndexWriter);
IntStream.range(0, vectorsPerField.size()).forEach(i -> {
try {
if (vectorsPerField.get(i).isEmpty()) {
Expand All @@ -835,7 +832,7 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
final Long expectedTimesGetVectorValuesIsCalled = vectorsPerField.stream().filter(Predicate.not(Map::isEmpty)).count();
knnVectorValuesFactoryMockedStatic.verify(
() -> KNNVectorValuesFactory.getVectorValues(any(VectorDataType.class), any(DocsWithFieldSet.class), any()),
times(0)
times(Math.toIntExact(expectedTimesGetVectorValuesIsCalled) * 2)
);
}
}
Expand Down

0 comments on commit 01219ae

Please sign in to comment.