From supermemo.guru

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

History of item difficulty in SuperMemo

Item difficulty is a term used in SuperMemo to refer to the degree of effort needed to retain knowledge in memory. Difficult items are simply hard to remember.

Historically, item difficulty was expressed using concepts that kept evolving with the SuperMemo algorithm. In SuperMemo 1.0 through 7.0, difficulty was measured by E-Factors that would depend on the contents of the collection. In SuperMemo 8.0, difficulty was measured by A-Factors that made the first absolute measure of difficulty. In Algorithm SM-17, absolute difficulty would be obtained by the best fit of repetition history to the recall matrix.

After 25 years of pursuing the ideal model of absolute difficulty, we have documented dozens of cases where difficulty changes abruptly. For example, a very hard item may "click" (e.g. with the use of a mnemonic trick). It can literally become easy overnight. In the past, reformulated items would need to be re-memorized. However, the change may occur outside SuperMemo. Similarly, a very easy item may suddenly become hard to remember due to interference.

Item difficulty in Algorithm SM-18

The new approach in Algorithm SM-18 means that the concept of absolute difficulty currently refers only to a single repetition. This is a very coarse measure because we only know the binary value of the repetition success, and the expectation of success determined by memory retrievability. The only way to use such a coarse measure of difficulty is to treat it as a trailing resultant of past performance. This means that difficulty is modified on the basis of expectations, while the parameters of that change are optimized globally for large bodies of data.

As Algorithm SM-17 would match repetition history to recall predictions, only the timing of failing repetitions would decide item difficulty. This resulted in clustering items in very difficult or very easy quantiles with fewer values in between. Some users expressed dissatisfaction with this solution. However, it is easy to forget that absolute difficulty in Algorithm SM-15 was an effect of a stepwise approximation (i.e. it was getting closer to the ideal value with each repetition). This gradual progression resulted in "nicer" distributions of difficulty. Algorithm SM-17 would provide a perfect fit of repetition history to the expected recall. This way a "true" absolute measure of difficulty was achieved.

Algorithm SM-18 abandons the idea of absolute difficulty and brings back the incremental changes to difficulty in the direction determined by the expected performance. This may be very pleasing to our intuitions about item difficulty, however, as of Dec 2019, we have too little data to prove that the new approach is superior (or worse). True picture will emerge over time. We only see that users who tested the new algorithm tend to enjoy the granular nature of difficulty. If all we achieve with the change is happier students, the change will be worth the invested time.

Computing item difficulty

Algorithm SM-17 would look for absolute item difficulty that would provide the best fit to the memory model (minimum deviation of grades from the expected recall). Algorithm SM-18 takes each repetition separately, and estimates expectation-based difficulty for that repetition only. It maps linearly from bet-win metric (BW) to difficulty in the range from 0.0 to 1.0 (see the picture). As the default forgetting index is 10%, all items with BW above 0.1 are considered maximally easy (1.0-0.9=0.1), and all items with BW below -0.9 are considered maximally hard (0.0-0.9=-0.9).

Before any data is available, items are marked with default difficulty. Currently, default difficulty is taken as 0.5. Global parameter optimizations indicate that this value has negligible impact on the performance of the algorithm. This is why it is taken as 50%, i.e. the equivalent of "no information".

At each repetition, new BW metric is computed, and the new difficulty (ItemDiff) is taken as a trailing average of current BW-derived difficulty estimate (RepDiff), and prior difficulty values (ItemDiff[i-1], ItemDiff[i-2], etc.):

ItemDiff[i]:=f*RepDiff[I]+(1-f)*ItemDiff[i-1]

The impact of earlier repetitions is greater than the impact of later repetitions. Trailing average factors (f) have been set to maximize the performance of the algorithm via parameter optimization (those values range from ~0.8 early to ~0.1 late in the process).

Figure: Item difficulty in Algorithm SM-18 is computed on the basis of bet-win metric (BW). For each repetition, BW metric is mapped linearly into item difficulty in its (0.1..-0.9) range. BW=0.1 indicates easiest items (difficulty=0.0), while BW=-0.9 corresponds with the hardest items (difficulty=1.0). In the presented example, a repetition resulted in the departure from expectation: BW=-0.3, which maps to suggested difficulty of 0.44. The blue line represents linear mapping. Thin red lines represent maximum and minimum difficulty. The thick red line denotes the difficulty range. Vertical axis show the span of the BW metric from -1.0 to +1.0

Impact of non-absolute difficulty

The performance of the algorithm with the new approach to estimating item difficulty did not change much. All metrics look pretty good, and may surpass or underperform depending on a particular collection. There is a good match of retrievability to the recall matrix. Forgetting and stabilization curves are pretty regular, which is an indication of a reasonable separation by difficulty. Difficulty distributions are more informative due to the incremental nature of the estimate. Difficulties shown in history of repetitions are more "pleasing", which addresses the original user criticism of the "uninformative" nature of the perfect-fit solution.

Figure: Typical distribution of item difficulty in Algorithm SM-18 shows less clustering than in Algorithm SM-17. This is reminiscent of earlier algorithms based on incrementally determined absolute difficulty

Implementation costs

The new approach to difficulty is much easier to implement (a fraction of the original code), and computationally less intense (no need to employ costly hill-climbing procedures). This may later turn out important for those SuperMemo applications where computational resources are limited.