Skip to content

Commit 110e309

Browse files
committed
HSEARCH-4950 Add new knn option to the documentation
1 parent f452776 commit 110e309

File tree

5 files changed

+183
-14
lines changed

5 files changed

+183
-14
lines changed

documentation/src/main/asciidoc/public/reference/_mapping-directfieldmapping.adoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,8 +136,6 @@ which generally gives much more flexibility.
136136
+
137137
include::../components/_incubating-warning.adoc[]
138138
+
139-
WARNING: Vector fields are only supported by the <<backend-lucene, Lucene backend>> for now.
140-
+
141139
Specific field type for vector fields to be used in a <<search-dsl-predicate-knn,vector search>>.
142140
+
143141
Vector fields accept values of type `float[]` or `byte[]` and *require* that

documentation/src/main/asciidoc/public/reference/_search-dsl-predicate.adoc

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -856,6 +856,13 @@ but can be <<search-dsl-predicate-common-constantScore,made constant with `.cons
856856
* The score of an `and` predicate can be <<search-dsl-predicate-common-boost,boosted>>
857857
with a call to `.boost(...)`.
858858

859+
[[search-dsl-predicate-and-limitations]]
860+
=== Limitations
861+
862+
Keep in mind that adding a <<search-dsl-predicate-knn,knn>> as an `and` clause will lead to an exception when using
863+
<<backend-elasticsearch-compatibility-elasticsearch,an Elastic distribution of Elasticsearch>>.
864+
See <<search-dsl-predicate-knn-limitations,`knn` limitations>> for more details.
865+
859866
[[search-dsl-predicate-or]]
860867
== `or`: match any clause
861868

@@ -1147,6 +1154,13 @@ but can be <<search-dsl-predicate-common-constantScore,made constant with `.cons
11471154
* The score of a `bool` predicate can be <<search-dsl-predicate-common-boost,boosted>>
11481155
with a call to `.boost(...)`.
11491156

1157+
[[search-dsl-predicate-boolean-limitations]]
1158+
=== Limitations
1159+
1160+
Keep in mind that adding a <<search-dsl-predicate-knn,knn>> as a clause to a boolean predicates
1161+
has some <<search-dsl-predicate-knn-limitations,limitations>> when using
1162+
<<backend-elasticsearch-compatibility-elasticsearch,an Elastic distribution of Elasticsearch>>.
1163+
11501164
[[search-dsl-predicate-simple-query-string]]
11511165
== [[_simple_query_string_queries]] `simpleQueryString`: match a user-provided query string
11521166

@@ -1504,6 +1518,64 @@ include::{sourcedir}/org/hibernate/search/documentation/search/predicate/Predica
15041518
----
15051519
====
15061520

1521+
[[search-dsl-predicate-knn-with-text-search]]
1522+
=== Combining `knn` with other predicates
1523+
1524+
A `knn` predicate can be combined with the regular text-search predicates. It can improve the quality of search results
1525+
by increasing the score of documents that are more relevant based on vector embeddings characteristics:
1526+
1527+
.Enriching regular text search with K-Nearest Neighbors search
1528+
====
1529+
[source, JAVA, indent=0, subs="+callouts"]
1530+
----
1531+
include::{sourcedir}/org/hibernate/search/documentation/search/predicate/PredicateDslIT.java[tags=knn-and-match]
1532+
----
1533+
<1> Find science fiction books.
1534+
<2> Improve the score of science fiction books that have a cover similar to the one we are searching for.
1535+
====
1536+
1537+
[[search-dsl-predicate-knn-limitations]]
1538+
=== Backend specifics and limitations
1539+
1540+
With the Elasticsearch backend and when using <<backend-elasticsearch-compatibility-elasticsearch,Elasticsearch>>,
1541+
a `knn` predicate can only be added as a top-level predicate,
1542+
i.e. a predicate directly passed to a where clause of a search query,
1543+
or as part of a top-level disjunction,
1544+
i.e. as `should` clauses of a top-level <<search-dsl-predicate-boolean,`boolean` predicate>>
1545+
(the same can be achieved by using an <<search-dsl-predicate-or,`or` predicate>>).
1546+
A `knn` predicate can be combined with other predicate types by adding them through should clauses.
1547+
Any other usages of a `knn` predicate, with this backend, would lead to an exception being thrown.
1548+
See this section of the Elasticsearch link:{elasticsearchDocUrl}/knn-search.html#_combine_approximate_knn_with_other_features[documentation]
1549+
in particular to learn more about how Elasticsearch combines regular queries with knn.
1550+
<<backend-elasticsearch-compatibility-opensearch,OpenSearch>> does not have these limitation.
1551+
1552+
.Multiple `knn` predicates added via `should` clauses
1553+
====
1554+
[source, JAVA, indent=0, subs="+callouts"]
1555+
----
1556+
include::{sourcedir}/org/hibernate/search/documentation/search/predicate/PredicateDslIT.java[tags=knn-should]
1557+
----
1558+
====
1559+
1560+
The Elasticsearch backend, when using <<backend-elasticsearch-compatibility-elasticsearch,Elasticsearch>>,
1561+
also allows configuring a backend-specific `knn` predicate option: the number of candidates.
1562+
This option specifies a number of approximate nearest neighbor candidates to be found on each shard,
1563+
then the results from each shard are merged and the top `k` are selected.
1564+
When not specified explicitly, Hibernate Search will default the number of candidates value to `k`.
1565+
See the Elasticsearch link:{elasticsearchDocUrl}/knn-search.html#tune-approximate-knn-for-speed-accuracy[documentation] for more details.
1566+
<<backend-elasticsearch-compatibility-opensearch,OpenSearch>> does not expose this option.
1567+
1568+
.Setting a number of candidates Elasticsearch-specific knn option
1569+
====
1570+
[source, JAVA, indent=0, subs="+callouts"]
1571+
----
1572+
include::{sourcedir}/org/hibernate/search/documentation/search/predicate/PredicateDslIT.java[tags=knn-candidates]
1573+
----
1574+
<1> Get an extended, Elasticsearch-specific, predicate factory.
1575+
<2> Build a `knn` predicate as ususal.
1576+
<3> Provide an Elasticsearch-specific predicate option.
1577+
====
1578+
15071579
[[search-dsl-predicate-knn-other]]
15081580
=== Other options
15091581

documentation/src/test/java/org/hibernate/search/documentation/search/predicate/Book.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ public class Book {
4444
private String comment;
4545

4646
private float[] coverImageEmbeddings;
47+
private float[] alternativeCoverImageEmbeddings;
4748

4849
@ManyToMany
4950
@IndexedEmbedded(structure = ObjectStructure.NESTED)
@@ -108,6 +109,14 @@ public void setCoverImageEmbeddings(float[] coverImageEmbeddings) {
108109
this.coverImageEmbeddings = coverImageEmbeddings;
109110
}
110111

112+
public float[] getAlternativeCoverImageEmbeddings() {
113+
return alternativeCoverImageEmbeddings;
114+
}
115+
116+
public void setAlternativeCoverImageEmbeddings(float[] alternativeCoverImageEmbeddings) {
117+
this.alternativeCoverImageEmbeddings = alternativeCoverImageEmbeddings;
118+
}
119+
111120
public List<Author> getAuthors() {
112121
return authors;
113122
}

documentation/src/test/java/org/hibernate/search/documentation/search/predicate/PredicateDslIT.java

Lines changed: 98 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
import jakarta.persistence.EntityManagerFactory;
2020

21+
import org.hibernate.search.backend.elasticsearch.ElasticsearchExtension;
2122
import org.hibernate.search.documentation.testsupport.BackendConfigurations;
2223
import org.hibernate.search.documentation.testsupport.DocumentationSetupHelper;
2324
import org.hibernate.search.engine.search.common.BooleanOperator;
@@ -32,7 +33,9 @@
3233
import org.hibernate.search.mapper.orm.mapping.HibernateOrmSearchMappingConfigurer;
3334
import org.hibernate.search.mapper.orm.scope.SearchScope;
3435
import org.hibernate.search.mapper.orm.session.SearchSession;
36+
import org.hibernate.search.mapper.pojo.mapping.definition.programmatic.TypeMappingStep;
3537
import org.hibernate.search.util.common.data.RangeBoundInclusion;
38+
import org.hibernate.search.util.impl.integrationtest.backend.elasticsearch.dialect.ElasticsearchTestDialect;
3639
import org.hibernate.search.util.impl.integrationtest.common.extension.BackendConfiguration;
3740

3841
import org.junit.jupiter.api.BeforeEach;
@@ -54,20 +57,33 @@ class PredicateDslIT {
5457

5558
private EntityManagerFactory entityManagerFactory;
5659

60+
private static boolean isVectorSearchSupported() {
61+
return BackendConfiguration.isLucene()
62+
|| ElasticsearchTestDialect.isActualVersion(
63+
es -> !es.isLessThan( "8.0" ),
64+
os -> !os.isLessThan( "2.0" ),
65+
aoss -> true
66+
);
67+
}
68+
5769
@BeforeEach
5870
void setup() {
5971
entityManagerFactory = setupHelper.start().setup( Book.class, Author.class, EmbeddableGeoPoint.class );
6072

6173
DocumentationSetupHelper.SetupContext setupContext = setupHelper.start();
62-
// NOTE: To keep this documentation example simple there is no testing with Elasticsearch/OpenSearch
63-
// as not all versions have integration implemented e.g. Elasticsearch 7 or OpenSearch 1.3 will throw exceptions
64-
if ( BackendConfiguration.isLucene() ) {
74+
// NOTE: If backend does not support vector search it will lead to runtime exceptions, so we cannot simply annotate
75+
// the corresponding properties with @VectorField; instead we add it programmatically when it's possible
76+
if ( isVectorSearchSupported() ) {
6577
setupContext.withProperty(
6678
HibernateOrmMapperSettings.MAPPING_CONFIGURER,
67-
(HibernateOrmSearchMappingConfigurer) context -> context.programmaticMapping()
68-
.type( Book.class )
69-
.property( "coverImageEmbeddings" )
70-
.vectorField( 128 )
79+
(HibernateOrmSearchMappingConfigurer) context -> {
80+
TypeMappingStep book = context.programmaticMapping()
81+
.type( Book.class );
82+
book.property( "coverImageEmbeddings" )
83+
.vectorField( 128 );
84+
book.property( "alternativeCoverImageEmbeddings" )
85+
.vectorField( 128 );
86+
}
7187
);
7288
}
7389
entityManagerFactory = setupContext.setup( Book.class, Author.class, EmbeddableGeoPoint.class );
@@ -1110,14 +1126,14 @@ void knn() {
11101126
// NOTE: To keep this documentation example simple there is no testing with Elasticsearch/OpenSearch
11111127
// as not all versions have integration implemented e.g. Elasticsearch 7 or OpenSearch 1.3 will throw exceptions
11121128
assumeTrue(
1113-
BackendConfiguration.isLucene(),
1129+
isVectorSearchSupported(),
11141130
"This test only makes sense if the backend supports vectors"
11151131
);
11161132
withinSearchSession( searchSession -> {
11171133
// tag::knn[]
11181134
float[] coverImageEmbeddingsVector = /*...*/
11191135
// end::knn[]
1120-
new float[128];
1136+
floats( 128, 1.0f );
11211137
// tag::knn[]
11221138
List<Book> hits = searchSession.search( Book.class )
11231139
.where( f -> f.knn( 5 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector ) )
@@ -1132,7 +1148,7 @@ void knn() {
11321148
// tag::knn-filter[]
11331149
float[] coverImageEmbeddingsVector = /*...*/
11341150
// end::knn-filter[]
1135-
new float[128];
1151+
floats( 128, 1.0f );
11361152
// tag::knn-filter[]
11371153
List<Book> hits = searchSession.search( Book.class )
11381154
.where( f -> f.knn( 5 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector )
@@ -1143,6 +1159,74 @@ void knn() {
11431159
.extracting( Book::getId )
11441160
.containsExactlyInAnyOrder( BOOK1_ID, BOOK2_ID, BOOK3_ID );
11451161
} );
1162+
1163+
withinSearchSession( searchSession -> {
1164+
// tag::knn-should[]
1165+
float[] coverImageEmbeddingsVector = /*...*/
1166+
// end::knn-should[]
1167+
floats( 128, 1.0f );
1168+
// tag::knn-should[]
1169+
float[] alternativeCoverImageEmbeddingsVector = /*...*/
1170+
// end::knn-should[]
1171+
floats( 128, 1.0f );
1172+
// tag::knn-should[]
1173+
List<Book> hits = searchSession.search( Book.class )
1174+
.where( f -> f.bool()
1175+
.should( f.knn( 10 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector ) )
1176+
.should( f.knn( 5 ).field( "alternativeCoverImageEmbeddings" )
1177+
.matching( alternativeCoverImageEmbeddingsVector ) )
1178+
)
1179+
.fetchHits( 20 );
1180+
// end::knn-should[]
1181+
assertThat( hits )
1182+
.extracting( Book::getId )
1183+
.containsExactlyInAnyOrder( BOOK1_ID, BOOK2_ID, BOOK3_ID, BOOK4_ID );
1184+
} );
1185+
1186+
if ( !BackendConfiguration.isElasticsearch()
1187+
|| ElasticsearchTestDialect.isActualVersion(
1188+
es -> false,
1189+
os -> !os.isLessThan( "2.0" ),
1190+
aoss -> true
1191+
) ) {
1192+
withinSearchSession( searchSession -> {
1193+
// tag::knn-and-match[]
1194+
float[] coverImageEmbeddingsVector = /*...*/
1195+
// end::knn-and-match[]
1196+
floats( 128, 1.0f );
1197+
// tag::knn-and-match[]
1198+
List<Book> hits = searchSession.search( Book.class )
1199+
.where( f -> f.bool()
1200+
.must( f.match().field( "genre" ).matching( Genre.SCIENCE_FICTION ) ) // <1>
1201+
.should( f.knn( 10 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector ) ) // <2>
1202+
)
1203+
.fetchHits( 20 );
1204+
// end::knn-and-match[]
1205+
assertThat( hits )
1206+
.extracting( Book::getId )
1207+
.containsExactlyInAnyOrder( BOOK1_ID, BOOK2_ID, BOOK3_ID );
1208+
} );
1209+
}
1210+
1211+
1212+
if ( BackendConfiguration.isElasticsearch() ) {
1213+
withinSearchSession( searchSession -> {
1214+
// tag::knn-candidates[]
1215+
float[] coverImageEmbeddingsVector = /*...*/
1216+
// end::knn-candidates[]
1217+
floats( 128, 1.0f );
1218+
// tag::knn-candidates[]
1219+
List<Book> hits = searchSession.search( Book.class )
1220+
.where( f -> f.extension( ElasticsearchExtension.get() ) // <1>
1221+
.knn( 5 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector ) // <2>
1222+
.numberOfCandidates( 15 ) )// <3>
1223+
.fetchHits( 20 );
1224+
// end::knn-candidates[]
1225+
assertThat( hits )
1226+
.extracting( Book::getId )
1227+
.containsExactlyInAnyOrder( BOOK1_ID, BOOK2_ID, BOOK3_ID, BOOK4_ID );
1228+
} );
1229+
}
11461230
}
11471231

11481232
private MySearchParameters getSearchParameters() {
@@ -1208,6 +1292,7 @@ private void initData() {
12081292
book1.setGenre( Genre.SCIENCE_FICTION );
12091293
book1.getAuthors().add( isaacAsimov );
12101294
book1.setCoverImageEmbeddings( floats( 128, 1.0f ) );
1295+
book1.setAlternativeCoverImageEmbeddings( floats( 128, 10.0f ) );
12111296
isaacAsimov.getBooks().add( book1 );
12121297

12131298
Book book2 = new Book();
@@ -1219,6 +1304,7 @@ private void initData() {
12191304
book2.setComment( "Really liked this one!" );
12201305
book2.getAuthors().add( isaacAsimov );
12211306
book2.setCoverImageEmbeddings( floats( 128, 2.0f ) );
1307+
book2.setAlternativeCoverImageEmbeddings( floats( 128, 20.0f ) );
12221308
isaacAsimov.getBooks().add( book2 );
12231309

12241310
Book book3 = new Book();
@@ -1229,6 +1315,7 @@ private void initData() {
12291315
book3.setGenre( Genre.SCIENCE_FICTION );
12301316
book3.getAuthors().add( isaacAsimov );
12311317
book3.setCoverImageEmbeddings( floats( 128, 3.0f ) );
1318+
book3.setAlternativeCoverImageEmbeddings( floats( 128, 30.0f ) );
12321319
isaacAsimov.getBooks().add( book3 );
12331320

12341321
Book book4 = new Book();
@@ -1239,6 +1326,7 @@ private void initData() {
12391326
book4.setGenre( Genre.CRIME_FICTION );
12401327
book4.getAuthors().add( aLeeMartinez );
12411328
book4.setCoverImageEmbeddings( floats( 128, 4.0f ) );
1329+
book4.setAlternativeCoverImageEmbeddings( floats( 128, 40.0f ) );
12421330
aLeeMartinez.getBooks().add( book3 );
12431331

12441332
entityManager.persist( isaacAsimov );

integrationtest/backend/elasticsearch/src/test/java/org/hibernate/search/integrationtest/backend/elasticsearch/search/ElasticsearchKnnPredicateSpecificsIT.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ void knnPredicateInWrongPlace_aggregation() {
183183
index.query().select()
184184
.where( f -> f.matchAll() )
185185
.aggregation( countsByParking, agg -> agg.terms().field( "object.nestedParking", Boolean.class )
186-
.filter( f -> f.knn( 10 ).field( "location" ).matching(50.0f, 50.0f) ) )
186+
.filter( f -> f.knn( 10 ).field( "location" ).matching( 50.0f, 50.0f ) ) )
187187
.toQuery();
188188
} );
189189
}
@@ -230,7 +230,9 @@ private static class PredicateIndexBinding {
230230
IndexSchemaObjectField nested = root.objectField( "object", ObjectStructure.NESTED );
231231
object = nested.toReference();
232232
nestedParking = nested.field( "nestedParking", f -> f.asBoolean().aggregable( Aggregable.YES ) ).toReference();
233-
nestedRating = nested.field( "nestedRating", f -> f.asInteger().projectable( Projectable.YES ).sortable( Sortable.YES ) ).toReference();
233+
nestedRating =
234+
nested.field( "nestedRating", f -> f.asInteger().projectable( Projectable.YES ).sortable( Sortable.YES ) )
235+
.toReference();
234236
}
235237

236238
}

0 commit comments

Comments
 (0)