You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Random Forest Classification requires at least 5% of the data to be kept apart for 'validation'. This is additional to the min. 5% test data.
However, it looks to me that these validation samples are never used.
Would it be possible to disable this additional split?
Just like the Decision Tree Classification: with an 'Holdout Test Data' subset of 5-95%.
It keeps more data available for either training or testing.
Thanks!
The text was updated successfully, but these errors were encountered:
In all analyses, the validation data set is used for assessing the model performance for each individual iteration in the optimization loop, which in the case of random forest goes over the maximum number of trees. This is what the validation samples are used for.
The split consisting of train -test only can be achieved by manually setting the maximum number of trees in the forest, hence removing the need for the validation samples.
Given the above, I'm not entirely sure about the specifics of your request. Is this what you are looking for, or did I misunderstand?
Thanks. I understand about the optimisation loop, but in the random forest modelling there is this additional set of 5% kept apart. So you are required to 'spend' at least 10% on test/validation. Whereas in the Decision Tree Classification 5% is sufficient. There is no way to disable this in JASP's random forest settings and there is only one assessment of model performance required during the training phase. In other words, how to explain the 95/5% split in Decision Tree Classification and the 90/5/5% in the random forest if in both cases 95/5% is sufficient (and presented in the ROC curves)?
The Random Forest Classification requires at least 5% of the data to be kept apart for 'validation'. This is additional to the min. 5% test data.
However, it looks to me that these validation samples are never used.
Would it be possible to disable this additional split?
Just like the Decision Tree Classification: with an 'Holdout Test Data' subset of 5-95%.
It keeps more data available for either training or testing.
Thanks!
The text was updated successfully, but these errors were encountered: