Bias in the evaluation of SSP-MMC-FSRS #3

1DWalker · 2025-01-25T01:55:00Z

SSP-MMC-FSRS might just be finding errors in FSRS and exploiting it. Even small errors would be exploited, making SSP-MMC-FSRS's predicted review cost almost always an underestimate of the true value.

To properly evaluate SSP-MMC-FSRS, we can run an experiment with real users, but we probably don't have the resources nor time to do this and such an experiment would take years to measure the half life.

So here's an alternative: we use alternative memory models to evaluate FSRS. For instance, we can use the GRU model that predicts a forgetting curve.
The general problem setup would be:

sample a user
sample a review history prefix to pretrain both FSRS and GRU on
run SSP-MMC-FSRS to get a scheduler from the FSRS parameters.
run a simulation for a card. The review intervals are given by SSP-MMC-FSRS, the transition probabilities given by GRU, and the objective is reached only when GRU's forgetting curve suggests that it is so.

Regarding the objective there is a problem if SSP-MMC-FSRS believes that the half-life has been reached but GRU doesn't think so. In this case I have some ideas:

SSP-MMC-FSRS would just keep scheduling a review at the half life's interval.
SSP-MMC-FSRS should be trained on something large like 100 years, but we only evaluate on something shorter such as 3 years. This should heavily reduce the frequency of the model disagreement.

In addition, I believe that GRU was trained from data where the average retention is higher than 50%. If we want better results from evaluating with GRU, the objective half-life should be something like 80% rather than 50%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bias in the evaluation of SSP-MMC-FSRS #3

Bias in the evaluation of SSP-MMC-FSRS #3

1DWalker commented Jan 25, 2025

Bias in the evaluation of SSP-MMC-FSRS #3

Bias in the evaluation of SSP-MMC-FSRS #3

Comments

1DWalker commented Jan 25, 2025