This document describes the spaced repetition algorithm used to learn and remember new words.
Spaced repetition is the process of remembering new words by repeating them at a later date.1 It uses an algorithm to determine what cards, based on past accuracy, times studied, and other metrics, are most likely to have been forgotten by the user. These cards are then prompted to be studied again.
The central idea behind spaced repetition is that cards that are more easily remembered should be studied less often whereas cards that are less likely to be remembered should be studied more frequently. This has been shown to increase connections in long-term memory, as the user must be able to recall cards from past sessions, thus leading to an increased likelihood of recall at a later point in time.2
In order to track which cards are most likely to be forgotten, it is necessary to develop a system that is capable of predicting, given available metrics, the retention rate, or amount of information retained as a percentage, of a particular card. This is achieved using a forgetting curve, which models the retention rate as a function of time.3
In order to generate this forgetting curve, Proto makes certain assumptions about how this function should behave in the form of a differential equation. First of all, we know that this curve is a function of time and that it must be monotonically decreasing. In other words:
Secondly, we make certain assumptions about how this rate decreases. The more information is retained at any given point in time, the more potential exists for loss (i.e. the same percentage loss of information results in a greater absolute loss), which means that:
But this is not a simple linear relationship. There is also a "scarcity" element, which essentially means that the less information is retained, the less work needs to be done to learn and remember the information, and thus the more likely it is to be remembered. However, if too much information is retained, it is more likely to be forgotten given the amount of work required to learn and remember it. Thus, the derivative must be twice proportional to the amount of information retained:
A final proportionality that can be noted is that the derivative must be inversely proportional to time.
The reasoning behind this is that if information has already been remembered for a long time, it is unlikely to be forgotten a short time after.
In other words, the longer information has been remembered, the more likely it is to continue to be retained.
Putting all these proportional relationships together and adding a proportionality constant
This ODE can be solved using separation of variables quite easily, which yields the function in question:
where
One of the problems with the resulting function
We also need to relate
A value of
This means that
Given the results in the previous section, we still need to calculate empirical values for
However, this now begs the question of how to find
We are now almost done!
The one problem that remains is that
where
Turning now to
Next, we need to turn this into a probability distribution. In other words, we can view each test as a "trial" in a binomial distribution5. Then, we can use the well-known Rule of Succession to calculate the probability of a future correct response given the previous correct responses (i.e. accuracy)6. This looks like:
Footnotes
-
"Spaced Repetition" Wikipedia https://en.wikipedia.org/wiki/Spaced_repetition ↩
-
Smolen, et al. "The right time to learn: mechanisms and optimization of spaced learning" Nature https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5126970/ ↩
-
"Forgetting curves" Wikipedia https://en.wikipedia.org/wiki/Forgetting_curve ↩
-
Proto used to use $m=3$ instead ↩
-
"Binomial distribution" Wikipedia https://en.wikipedia.org/wiki/Binomial_distribution ↩
-
"Rule of Succession" Wikipedia https://en.wikipedia.org/wiki/Rule_of_succession ↩