-
|
I am working with a schema where a field mcat_tree is defined as an array (eg. "mcat_tree": ["189194", "192170", "PID 19217", "189194R", "189194P"]). It is defined as an attribute field for memory/scale reasons. My goal is simple: In the ranking phase, I want to check if a specific string ("191984") or any one of the string from input array (["12345", "191984"]) (passed as a query parameter) exists in that array. If it exists, I want to apply a significant boost (e.g., +100 to the score). The Problem: While searching/filtering in YQL using contains works perfectly, I’ve struggled to find a clean, performant way to do this inside a rank-profile using only the array attribute. Nothing seems to be working out for one reason or the other - the only solution that has worked reliably is duplicating the data into a mapped tensor: field mcat_tree_tensor type tensor(mcat_tree{}) And then ranking using: sum(query(my_param) * attribute(mcat_tree_tensor)) Questions for the Community: Is this the intended pattern? Is creating a mirrored tensor the recommended "Vespa way" for ranking against array elements, or is there a way to use the array attribute directly in an expression that I’m missing? Memory Bloat: Tensors are powerful but memory-heavy. If I have millions of documents with 10-20 strings per array, and 3 array field per document, is the memory overhead of a mapped tensor the "price of entry" for this logic? Future Roadmap: Is there a plan to allow simpler element based checks in ranking expressions for array attributes (e.g., an in or contains operator) to avoid the tensor conversion? I’d love to hear how others handle "parameter-based boosting against multivalued array type attributes" without ballooning up their RAM usage. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
Hi! Instead of duplicating the field you could use tensorFromLabels(attribute,dimension): see https://docs.vespa.ai/en/reference/ranking/rank-features.html#document-features. But what you probably want is to use one of the multi-value attribute rank features: https://docs.vespa.ai/en/reference/ranking/rank-features.html#features-for-indexed-multivalue-string-fields |
Beta Was this translation helpful? Give feedback.
-
|
not sure about this one tbh. maybe check the Vespa docs or ask in their forums if no one else chimes in here. |
Beta Was this translation helpful? Give feedback.
Hi!
Instead of duplicating the field you could use tensorFromLabels(attribute,dimension): see https://docs.vespa.ai/en/reference/ranking/rank-features.html#document-features.
But what you probably want is to use one of the multi-value attribute rank features: https://docs.vespa.ai/en/reference/ranking/rank-features.html#features-for-indexed-multivalue-string-fields