-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Description
For bulk index requests with inference, each inference request is randomly sent to an inference allocation. Then the process waits until all inference requests have finished.
In practice this means that at the end of the bulk request, many nodes are idle and waiting for the last node(s) to finish. This is inefficient use of all resources. Due to fluctuations in request count (they're randomly assigned), and in document sizes, there will always be a node taking longer than the others. Note that this problem increases as the cluster size increases, as there are more nodes that can be the slowest, and more nodes are waiting. In case of a truly slow node (e.g. due to hardware issues), the problem becomes even bigger.
It would be better to do something about this, e.g. let the idle nodes pick up work from the slowest node or so. Exact details to be discussed.