Disable validation Data split Random Forest Classification #175

wmotte · 2022-07-25T14:42:43Z

The Random Forest Classification requires at least 5% of the data to be kept apart for 'validation'. This is additional to the min. 5% test data.
However, it looks to me that these validation samples are never used.

Would it be possible to disable this additional split?
Just like the Decision Tree Classification: with an 'Holdout Test Data' subset of 5-95%.
It keeps more data available for either training or testing.

Thanks!

koenderks · 2022-10-26T09:38:00Z

In all analyses, the validation data set is used for assessing the model performance for each individual iteration in the optimization loop, which in the case of random forest goes over the maximum number of trees. This is what the validation samples are used for.

The split consisting of train -test only can be achieved by manually setting the maximum number of trees in the forest, hence removing the need for the validation samples.

Given the above, I'm not entirely sure about the specifics of your request. Is this what you are looking for, or did I misunderstand?

wmotte · 2022-10-26T10:07:04Z

Thanks. I understand about the optimisation loop, but in the random forest modelling there is this additional set of 5% kept apart. So you are required to 'spend' at least 10% on test/validation. Whereas in the Decision Tree Classification 5% is sufficient. There is no way to disable this in JASP's random forest settings and there is only one assessment of model performance required during the training phase. In other words, how to explain the 95/5% split in Decision Tree Classification and the 90/5/5% in the random forest if in both cases 95/5% is sufficient (and presented in the ROC curves)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable validation Data split Random Forest Classification #175

Disable validation Data split Random Forest Classification #175

wmotte commented Jul 25, 2022

koenderks commented Oct 26, 2022 •

edited

Loading

wmotte commented Oct 26, 2022

Disable validation Data split Random Forest Classification #175

Disable validation Data split Random Forest Classification #175

Comments

wmotte commented Jul 25, 2022

koenderks commented Oct 26, 2022 • edited Loading

wmotte commented Oct 26, 2022

koenderks commented Oct 26, 2022 •

edited

Loading