Merge branch 'main' into main

Signed-off-by: Vikasht34 <[email protected]>
Vikasht34 · Feb 12, 2025 · 01219ae · 01219ae
2 parents 68ac180 + 349a715
commit 01219ae
Show file tree

Hide file tree

Showing 5 changed files with 64 additions and 49 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,47 +16,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 * Update package name to fix compilation issue [#2513](https://github.com/opensearch-project/k-NN/pull/2513)
 ### Refactoring
 
-## [Unreleased 2.x](https://github.com/opensearch-project/k-NN/compare/2.18...2.x)
+## [Unreleased 2.x](https://github.com/opensearch-project/k-NN/compare/2.19...2.x)
 ### Features
-- Add Support for Multi Values in innerHit for Nested k-NN Fields in Lucene and FAISS (#2283)[https://github.com/opensearch-project/k-NN/pull/2283]
-- Add binary index support for Lucene engine. (#2292)[https://github.com/opensearch-project/k-NN/pull/2292]
-- Add expand_nested_docs Parameter support to NMSLIB engine (#2331)[https://github.com/opensearch-project/k-NN/pull/2331]
-- Add a new build mode, `FAISS_OPT_LEVEL=avx512_spr`, which enables the use of advanced AVX-512 instructions introduced with Intel(R) Sapphire Rapids (#2404)[https://github.com/opensearch-project/k-NN/pull/2404]
-- Add cosine similarity support for faiss engine (#2376)[https://github.com/opensearch-project/k-NN/pull/2376]
-- Add derived source feature for vector fields (#2449)[https://github.com/opensearch-project/k-NN/pull/2449]
 ### Enhancements
-- Introduced a writing layer in native engines where relies on the writing interface to process IO. (#2241)[https://github.com/opensearch-project/k-NN/pull/2241]
-- Allow method parameter override for training based indices (#2290) https://github.com/opensearch-project/k-NN/pull/2290]
-- Optimizes lucene query execution to prevent unnecessary rewrites (#2305)[https://github.com/opensearch-project/k-NN/pull/2305]
-- Add check to directly use ANN Search when filters match all docs. (#2320)[https://github.com/opensearch-project/k-NN/pull/2320]
-- Use one formula to calculate cosine similarity (#2357)[https://github.com/opensearch-project/k-NN/pull/2357]
-- Add WithFieldName implementation to KNNQueryBuilder (#2398)[https://github.com/opensearch-project/k-NN/pull/2398]
-- Make the build work for M series MacOS without manual code changes and local JAVA_HOME config (#2397)[https://github.com/opensearch-project/k-NN/pull/2397]
-- Enabled concurrent graph creation for Lucene engine with index thread qty settings(#2480)[https://github.com/opensearch-project/k-NN/pull/2480]
-- Remove DocsWithFieldSet reference from NativeEngineFieldVectorsWriter (#2408)[https://github.com/opensearch-project/k-NN/pull/2408]
 ### Bug Fixes
-* Fixing the bug when a segment has no vector field present for disk based vector search (#2282)[https://github.com/opensearch-project/k-NN/pull/2282]
-* Fixing the bug where search fails with "fields" parameter for an index with a knn_vector field (#2314)[https://github.com/opensearch-project/k-NN/pull/2314]
-* Fix for NPE while merging segments after all the vector fields docs are deleted (#2365)[https://github.com/opensearch-project/k-NN/pull/2365]
-* Allow validation for non knn index only after 2.17.0 (#2315)[https://github.com/opensearch-project/k-NN/pull/2315]
-* Fixing the bug to prevent updating the index.knn setting after index creation(#2348)[https://github.com/opensearch-project/k-NN/pull/2348]
-* Release query vector memory after execution (#2346)[https://github.com/opensearch-project/k-NN/pull/2346]
-* Fix shard level rescoring disabled setting flag (#2352)[https://github.com/opensearch-project/k-NN/pull/2352]
-* Fix filter rewrite logic which was resulting in getting inconsistent / incorrect results for cases where filter was getting rewritten for shards (#2359)[https://github.com/opensearch-project/k-NN/pull/2359]
-* Fixing it to retrieve space_type from index setting when both method and top level don't have the value. [#2374](https://github.com/opensearch-project/k-NN/pull/2374)
-* Fixing the bug where setting rescore as false for on_disk knn_vector query is a no-op (#2399)[https://github.com/opensearch-project/k-NN/pull/2399]
-* Fixing bug where mapping accepts both dimension and model-id (#2410)[https://github.com/opensearch-project/k-NN/pull/2410]
 ### Infrastructure
-* Updated C++ version in JNI from c++11 to c++17 [#2259](https://github.com/opensearch-project/k-NN/pull/2259)
-* Upgrade bytebuddy and objenesis version to match OpenSearch core and, update github ci runner for macos [#2279](https://github.com/opensearch-project/k-NN/pull/2279)
 ### Documentation
 ### Maintenance
-* Select index settings based on cluster version[2236](https://github.com/opensearch-project/k-NN/pull/2236)
-* Added periodic cache maintenance for QuantizationStateCache and NativeMemoryCache [#2308](https://github.com/opensearch-project/k-NN/pull/2308)
-* Added null checks for fieldInfo in ExactSearcher to avoid NPE while running exact search for segments with no vector field (#2278)[https://github.com/opensearch-project/k-NN/pull/2278]
-* Added Lucene BWC tests (#2313)[https://github.com/opensearch-project/k-NN/pull/2313]
-* Upgrade jsonpath from 2.8.0 to 2.9.0[2325](https://github.com/opensearch-project/k-NN/pull/2325)
-* Bump Faiss commit from 1f42e81 to 0cbc2a8 to accelerate hamming distance calculation using _mm512_popcnt_epi64  intrinsic and also add avx512-fp16 instructions to boost performance [#2381](https://github.com/opensearch-project/k-NN/pull/2381)
 * Enabled indices.breaker.total.use_real_memory setting via build.gradle for integTest Cluster to catch heap CB in local ITs and github CI actions [#2395](https://github.com/opensearch-project/k-NN/pull/2395/) 
+* Fixing Lucene912Codec Issue with BWC for Lucene 10.0.1 upgrade[#2429](https://github.com/opensearch-project/k-NN/pull/2429)
 * Enabled idempotency of local builds when using `./gradlew clean` and nest `jni/release` directory under `jni/build` for easier cleanup [#2516](https://github.com/opensearch-project/k-NN/pull/2516)
 ### Refactoring
diff --git a/release-notes/opensearch-knn.release-notes-2.19.0.0.md b/release-notes/opensearch-knn.release-notes-2.19.0.0.md
@@ -0,0 +1,53 @@
+## Version 2.19.0.0 Release Notes
+
+Compatible with OpenSearch 2.19.0
+
+### Features
+- Add Support for Multi Values in innerHit for Nested k-NN Fields in Lucene and FAISS [#2283](https://github.com/opensearch-project/k-NN/pull/2283)
+- Add binary index support for Lucene engine. [#2292](https://github.com/opensearch-project/k-NN/pull/2292)
+- Add expand_nested_docs Parameter support to NMSLIB engine [#2331](https://github.com/opensearch-project/k-NN/pull/2331)
+- Add a new build mode, `FAISS_OPT_LEVEL=avx512_spr`, which enables the use of advanced AVX-512 instructions introduced with Intel[R] Sapphire Rapids [#2404](https://github.com/opensearch-project/k-NN/pull/2404)
+- Add cosine similarity support for faiss engine [#2376](https://github.com/opensearch-project/k-NN/pull/2376)
+- Add concurrency optimizations with native memory graph loading and force eviction [#2265](https://github.com/opensearch-project/k-NN/pull/2345)
+- Add derived source feature for vector fields [#2449](https://github.com/opensearch-project/k-NN/pull/2449)
+### Enhancements
+- Introduced a writing layer in native engines where relies on the writing interface to process IO. [#2241](https://github.com/opensearch-project/k-NN/pull/2241)
+- Allow method parameter override for training based indices [#2290](https://github.com/opensearch-project/k-NN/pull/2290)
+- Optimizes lucene query execution to prevent unnecessary rewrites [#2305](https://github.com/opensearch-project/k-NN/pull/2305)
+- Added more detailed error messages for KNN model training [#2378](https://github.com/opensearch-project/k-NN/pull/2378)
+- Add check to directly use ANN Search when filters match all docs. [#2320](https://github.com/opensearch-project/k-NN/pull/2320)
+- Use one formula to calculate cosine similarity [#2357](https://github.com/opensearch-project/k-NN/pull/2357)
+- Make the build work for M series MacOS without manual code changes and local JAVA_HOME config [#2397](https://github.com/opensearch-project/k-NN/pull/2397)
+- Remove DocsWithFieldSet reference from NativeEngineFieldVectorsWriter [#2408](https://github.com/opensearch-project/k-NN/pull/2408)
+- Remove skip building graph check for quantization use case [#2430](https://github.com/opensearch-project/k-NN/pull/2430)
+- Removing redundant type conversions for script scoring for hamming space with binary vectors [#2351](https://github.com/opensearch-project/k-NN/pull/2351)
+- Update default to 0 to always build graph as default behavior [#2452](https://github.com/opensearch-project/k-NN/pull/2452)
+- Enabled concurrent graph creation for Lucene engine with index thread qty settings[#2480](https://github.com/opensearch-project/k-NN/pull/2480)
+### Bug Fixes
+* Fixing the bug when a segment has no vector field present for disk based vector search [#2282](https://github.com/opensearch-project/k-NN/pull/2282)
+* Fixing the bug where search fails with "fields" parameter for an index with a knn_vector field [#2314](https://github.com/opensearch-project/k-NN/pull/2314)
+* Fix for NPE while merging segments after all the vector fields docs are deleted [#2365](https://github.com/opensearch-project/k-NN/pull/2365)
+* Allow validation for non knn index only after 2.17.0 [#2315](https://github.com/opensearch-project/k-NN/pull/2315)
+* Fixing the bug to prevent updating the index.knn setting after index creation[#2348](https://github.com/opensearch-project/k-NN/pull/2348)
+* Release query vector memory after execution [#2346](https://github.com/opensearch-project/k-NN/pull/2346)
+* Fix shard level rescoring disabled setting flag [#2352](https://github.com/opensearch-project/k-NN/pull/2352)
+* Fix filter rewrite logic which was resulting in getting inconsistent / incorrect results for cases where filter was getting rewritten for shards [#2359](https://github.com/opensearch-project/k-NN/pull/2359)
+* Fixing it to retrieve space_type from index setting when both method and top level don't have the value. [#2374](https://github.com/opensearch-project/k-NN/pull/2374)
+* Fixing the bug where setting rescore as false for on_disk knn_vector query is a no-op [#2399](https://github.com/opensearch-project/k-NN/pull/2399)
+* Fixing the bug to prevent index.knn setting from being modified or removed on restore snapshot [#2445](https://github.com/opensearch-project/k-NN/pull/2445)
+* Fix Faiss byte vector efficient filter bug [#2448](https://github.com/opensearch-project/k-NN/pull/2448)
+* Fixing bug where mapping accepts both dimension and model-id [#2410](https://github.com/opensearch-project/k-NN/pull/2410)
+* Add version check for full field name validation [#2477](https://github.com/opensearch-project/k-NN/pull/2477)
+### Infrastructure
+* Updated C++ version in JNI from c++11 to c++17 [#2259](https://github.com/opensearch-project/k-NN/pull/2259)
+* Upgrade bytebuddy and objenesis version to match OpenSearch core and, update github ci runner for macos [#2279](https://github.com/opensearch-project/k-NN/pull/2279)
+### Documentation
+### Maintenance
+* Select index settings based on cluster version[2236](https://github.com/opensearch-project/k-NN/pull/2236)
+* Added periodic cache maintenance for QuantizationStateCache and NativeMemoryCache [#2308](https://github.com/opensearch-project/k-NN/pull/2308)
+* Added null checks for fieldInfo in ExactSearcher to avoid NPE while running exact search for segments with no vector field [#2278](https://github.com/opensearch-project/k-NN/pull/2278)
+* Added Lucene BWC tests [#2313](https://github.com/opensearch-project/k-NN/pull/2313)
+* Upgrade jsonpath from 2.8.0 to 2.9.0[2325](https://github.com/opensearch-project/k-NN/pull/2325)
+* Bump Faiss commit from 1f42e81 to 0cbc2a8 to accelerate hamming distance calculation using _mm512_popcnt_epi64  intrinsic and also add avx512-fp16 instructions to boost performance [#2381](https://github.com/opensearch-project/k-NN/pull/2381)
+* Deprecate nmslib engine [#2427](https://github.com/opensearch-project/k-NN/pull/2427)
+* Add spotless mirror repo for fixing builds [#2453](https://github.com/opensearch-project/k-NN/pull/2453)
diff --git a/src/main/java/org/opensearch/knn/index/KNNSettings.java b/src/main/java/org/opensearch/knn/index/KNNSettings.java
@@ -103,7 +103,7 @@ public class KNNSettings {
     public static final boolean KNN_DEFAULT_FAISS_AVX512_DISABLED_VALUE = false;
     public static final boolean KNN_DEFAULT_FAISS_AVX512_SPR_DISABLED_VALUE = false;
     public static final String INDEX_KNN_DEFAULT_SPACE_TYPE = "l2";
-    public static final Integer INDEX_KNN_ADVANCED_APPROXIMATE_THRESHOLD_DEFAULT_VALUE = 15_000;
+    public static final Integer INDEX_KNN_ADVANCED_APPROXIMATE_THRESHOLD_DEFAULT_VALUE = 0;
     public static final Integer INDEX_KNN_BUILD_VECTOR_DATA_STRUCTURE_THRESHOLD_MIN = -1;
     public static final Integer INDEX_KNN_BUILD_VECTOR_DATA_STRUCTURE_THRESHOLD_MAX = Integer.MAX_VALUE - 2;
     public static final String INDEX_KNN_DEFAULT_SPACE_TYPE_FOR_BINARY = "hamming";

diff --git a/...ain/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java b/...ain/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java
@@ -104,9 +104,8 @@ public void flush(int maxDoc, final Sorter.DocMap sortMap) throws IOException {
                 field.getVectors()
             );
             final QuantizationState quantizationState = train(field.getFieldInfo(), knnVectorValuesSupplier, totalLiveDocs);
-            // Check only after quantization state writer finish writing its state, since it is required
-            // even if there are no graph files in segment, which will be later used by exact search
-            if (shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
+            // should skip graph building only for non quantization use case and if threshold is met
+            if (quantizationState == null && shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
                 log.info(
                     "Skip building vector data structure for field: {}, as liveDoc: {} is less than the threshold {} during flush",
                     fieldInfo.name,
@@ -144,9 +143,8 @@ public void mergeOneField(final FieldInfo fieldInfo, final MergeState mergeState
         }
 
         final QuantizationState quantizationState = train(fieldInfo, knnVectorValuesSupplier, totalLiveDocs);
-        // Check only after quantization state writer finish writing its state, since it is required
-        // even if there are no graph files in segment, which will be later used by exact search
-        if (shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
+        // should skip graph building only for non quantization use case and if threshold is met
+        if (quantizationState == null && shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
             log.info(
                 "Skip building vector data structure for field: {}, as liveDoc: {} is less than the threshold {} during merge",
                 fieldInfo.name,

diff --git a/...rg/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriterFlushTests.java b/...rg/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriterFlushTests.java
@@ -633,8 +633,7 @@ public void testFlush_whenThresholdIsEqualToFixedValue_thenRelevantNativeIndexWr
         }
     }
 
-    public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNotMet_thenSkipBuildingGraph()
-        throws IOException {
+    public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNotMet_thenStillBuildGraph() throws IOException {
         // Given
         List<KNNVectorValues<float[]>> expectedVectorValues = new ArrayList<>();
         final Map<Integer, Integer> sizeMap = new HashMap<>();
@@ -717,7 +716,6 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
             } else {
                 assertEquals(0, knn990QuantWriterMockedConstruction.constructed().size());
             }
-            verifyNoInteractions(nativeIndexWriter);
             IntStream.range(0, vectorsPerField.size()).forEach(i -> {
                 try {
                     if (vectorsPerField.get(i).isEmpty()) {
@@ -732,12 +730,12 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
             final Long expectedTimesGetVectorValuesIsCalled = vectorsPerField.stream().filter(Predicate.not(Map::isEmpty)).count();
             knnVectorValuesFactoryMockedStatic.verify(
                 () -> KNNVectorValuesFactory.getVectorValues(any(VectorDataType.class), any(DocsWithFieldSet.class), any()),
-                times(0)
+                times(Math.toIntExact(expectedTimesGetVectorValuesIsCalled) * 2)
             );
         }
     }
 
-    public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNegative_thenSkipBuildingGraph()
+    public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThresholdIsNegative_thenStillBuildGraph()
         throws IOException {
         // Given
         List<KNNVectorValues<float[]>> expectedVectorValues = new ArrayList<>();
@@ -820,7 +818,6 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
             } else {
                 assertEquals(0, knn990QuantWriterMockedConstruction.constructed().size());
             }
-            verifyNoInteractions(nativeIndexWriter);
             IntStream.range(0, vectorsPerField.size()).forEach(i -> {
                 try {
                     if (vectorsPerField.get(i).isEmpty()) {
@@ -835,7 +832,7 @@ public void testFlush_whenQuantizationIsProvided_whenBuildGraphDatStructureThres
             final Long expectedTimesGetVectorValuesIsCalled = vectorsPerField.stream().filter(Predicate.not(Map::isEmpty)).count();
             knnVectorValuesFactoryMockedStatic.verify(
                 () -> KNNVectorValuesFactory.getVectorValues(any(VectorDataType.class), any(DocsWithFieldSet.class), any()),
-                times(0)
+                times(Math.toIntExact(expectedTimesGetVectorValuesIsCalled) * 2)
             );
         }
     }