Scoring sheets model is not reproducible

Scoring sheets return different results with the same data and parameters (I verified this manually by looking at the models). This makes the workflows non-reproducible.

I saw this at first because tests occasionally fail, like here: https://github.com/biolab/orange3/actions/runs/11673277446/job/32503650572?pr=6914