An aim of this monograph is to encourage students to use the appropriate learning technique (or techniques) to accomplish a given instructional objective. Some learning techniques are largely focused on bolstering students’ memory for facts (e.g., the keyword mnemonic), others are focused more on improving comprehension (e.g., self-explanation), and yet others may enhance both memory and comprehension (e.g., practice testing). Thus, our review of each learning technique describes how it can be used, its effectiveness for producing long-term retention and comprehension, and its breadth of efficacy across the categories of variables listed in Table 2 .

In discussing how the techniques influence criterion performance, we emphasize investigations that have gone beyond demonstrating improved memory for target material by measuring students’ comprehension, application, and transfer of knowledge. Note, however, that although gaining factual knowledge is not considered the only or ultimate objective of schooling, we unabashedly consider efforts to improve student retention of knowledge as essential for reaching other instructional objectives; if one does not remember core ideas, facts, or concepts, applying them may prove difficult, if not impossible. Students who have forgotten principles of algebra will be unable to apply them to solve problems or use them as a foundation for learning calculus (or physics, economics, or other related domains), and students who do not remember what operant conditioning is will likely have difficulties applying it to solve behavioral problems. We are not advocating that students spend their time robotically memorizing facts; instead, we are acknowledging the important interplay between memory for a concept on one hand and the ability to comprehend and apply it on the other.

The degree to which the efficacy of each learning technique obtains across long retention intervals and generalizes across different criterion tasks is of critical importance. Our reviews and recommendations are based on evidence, which typically pertains to students’ objective performance on any number of criterion tasks. Criterion tasks ( Table 2 , rightmost column) vary with respect to the specific kinds of knowledge that they tap. Some tasks are meant to tap students’ memory for information (e.g., “What is operant conditioning?”), others are largely meant to tap students’ comprehension (e.g., “Explain the difference between classical conditioning and operant conditioning”), and still others are meant to tap students’ application of knowledge (e.g., “How would you apply operant conditioning to train a dog to sit down?”). Indeed, Bloom and colleagues divided learning objectives into six categories, from memory (or knowledge) and comprehension of facts to their application, analysis, synthesis, and evaluation ( B. S. Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956 ; for an updated taxonomy, see L. W. Anderson & Krathwohl, 2001 ).

Any number of student characteristics could also influence the effectiveness of a given learning technique. For example, in comparison to more advanced students, younger students in early grades may not benefit from a technique. Students’ basic cognitive abilities, such as working memory capacity or general fluid intelligence, may also influence the efficacy of a given technique. In an educational context, domain knowledge refers to the valid, relevant knowledge a student brings to a lesson. Domain knowledge may be required for students to use some of the learning techniques listed in Table 1 . For instance, the use of imagery while reading texts requires that students know the objects and ideas that the words refer to so that they can produce internal images of them. Students with some domain knowledge about a topic may also find it easier to use self-explanation and elaborative interrogation, which are two techniques that involve answering “why” questions about a particular concept (e.g., “Why would particles of ice rise up within a cloud?”). Domain knowledge may enhance the benefits of summarization and highlighting as well. Nevertheless, although some domain knowledge will benefit students as they begin learning new content within a given domain, it is not a prerequisite for using most of the learning techniques.

Table 2. Examples of the Four Categories of Variables for Generalizability

Because teachers are most likely to learn about these techniques in educational psychology classes, we examined how some educational-psychology textbooks covered them ( Ormrod, 2008 ; Santrock, 2008 ; Slavin, 2009 ; Snowman, McCown, & Biehler, 2009 ; Sternberg & Williams, 2010 ; Woolfolk, 2007 ). Despite the promise of some of the techniques, many of these textbooks did not provide sufficient coverage, which would include up-to-date reviews of their efficacy and analyses of their generalizability and potential limitations. Accordingly, for all of the learning techniques listed in Table 1 , we reviewed the literature to identify the generalizability of their benefits across four categories of variables—materials, learning conditions, student characteristics, and criterion tasks. The choice of these categories was inspired by Jenkins’ (1979) model (for an example of its use in educational contexts, see Marsh & Butler, in press ), and examples of each category are presented in Table 2 . Materials pertain to the specific content that students are expected to learn, remember, or comprehend. Learning conditions pertain to aspects of the context in which students are interacting with the to-be-learned materials. These conditions include aspects of the learning environment itself (e.g., noisiness vs. quietness in a classroom), but they largely pertain to the way in which a learning technique is implemented. For instance, a technique could be used only once or many times (a variable referred to as dosage ) when students are studying, or a technique could be used when students are either reading or listening to the to-be-learned materials.

Thus, we limited our choices to techniques that could be implemented by students without assistance (e.g., without requiring advanced technologies or extensive materials that would have to be prepared by a teacher). Some training may be required for students to learn how to use a technique with fidelity, but in principle, students should be able to use the techniques without supervision. We also chose techniques for which a sufficient amount of empirical evidence was available to support at least a preliminary assessment of potential efficacy. Of course, we could not review all the techniques that meet these criteria, given the in-depth nature of our reviews, and these criteria excluded some techniques that show much promise, such as techniques that are driven by advanced technologies.

Toward meeting this challenge, we explored the efficacy of 10 learning techniques (listed in Table 1 ) that students could use to improve their success across a wide variety of content domains. 1 The learning techniques we consider here were chosen on the basis of the following criteria. We chose some techniques (e.g., self-testing, distributed practice) because an initial survey of the literature indicated that they could improve student success across a wide range of conditions. Other techniques (e.g., rereading and highlighting) were included because students report using them frequently. Moreover, students are responsible for regulating an increasing amount of their learning as they progress from elementary grades through middle school and high school to college. Lifelong learners also need to continue regulating their own learning, whether it takes place in the context of postgraduate education, the workplace, the development of new hobbies, or recreational activities.

If simple techniques were available that teachers and students could use to improve student learning and achievement, would you be surprised if teachers were not being told about these techniques and if many students were not using them? What if students were instead adopting ineffective learning techniques that undermined their achievement, or at least did not improve it? Shouldn’t they stop using these techniques and begin using ones that are effective? Psychologists have been developing and evaluating the efficacy of techniques for study and instruction for more than 100 years. Nevertheless, some effective techniques are underutilized—many teachers do not learn about them, and hence many students do not use them, despite evidence suggesting that the techniques could benefit student achievement with little added effort. Also, some learning techniques that are popular and often used by students are relatively ineffective. One potential reason for the disconnect between research on the efficacy of learning techniques and their use in educational practice is that because so many techniques are available, it would be challenging for educators to sift through the relevant research to decide which ones show promise of efficacy and could feasibly be implemented by students ( Pressley, Goodchild, Fleet, Zajchowski, & Evans, 1989 ).

On the basis of the available evidence, we rate interleaved practice as having moderate utility. On the positive side, interleaved practice has been shown to have relatively dramatic effects on students’ learning and retention of mathematical skills, and teachers and students should consider adopting it in the appropriate contexts. Also, interleaving does help (and rarely hinders) other kinds of cognitive skills. On the negative side, the literature on interleaved practice is currently small, but it contains enough null effects to raise concern. Although the null effects may indicate that the technique does not consistently work well, they may instead reflect that we do not fully understand the mechanisms underlying the effects of interleaving and therefore do not always use it appropriately. For instance, in some cases, students may not have had enough instruction or practice with individual tasks to reap the benefits of interleaved practice. Given the promise of interleaved practice for improving student achievement, there is a great need for research that systematically evaluates how its benefits are moderated by dosage during training, student abilities, and the difficulty of materials.

Not only is the result from Mayfield and Chase (2002) promising, their procedure offers a tactic for the implementation of interleaved practice, both by teachers in the classroom and by students regulating their study (for a detailed discussion of implementation, see Rohrer, 2009 ). In particular, after a given kind of problem (or topic) has been introduced, practice should first focus on that particular problem. After the next kind of problem is introduced (e.g., during another lecture or study session), that problem should first be practiced, but it should be followed by extra practice that involves interleaving the current type of problem with others introduced during previous sessions. As each new type of problem is introduced, practice should be interleaved with practice for problems from other sessions that students will be expected to discriminate between (e.g., if the criterion test will involve a mixture of several types of problems, then these should be practiced in an interleaved manner during class or study sessions). Interleaved practice may take a bit more time to use than blocked practice, because solution times often slow during interleaved practice; even so, such slowing likely indicates the recruitment of other processes—such as discriminative contrast—that boost performance. Thus, teachers and students could integrate interleaved practice into their schedules without too much modification.

It seems plausible that motivated students could easily use interleaving without help. Moreover, several studies have used procedures for instruction that could be used in the classroom (e.g., Hatala et al., 2003 ; Mayfield & Chase, 2002 ; Olina et al., 2006 ; Rau et al, 2010 ). We highlight one exemplary study here. Mayfield and Chase (2002) taught algebra rules to college students with poor math skills across 25 sessions. In different sessions, either a single algebra rule was introduced or previously introduced rules were reviewed. For review sessions, either the rule learned in the immediately previous session was reviewed (which was analogous to blocking) or the rule learned in the previous session was reviewed along with the rules from earlier sessions (which was analogous to interleaved practice). Tests were administered prior to training, during the session after each review, and then 4 to 9 weeks after practice ended. On the tests, students had to apply the rules they had learned as well as solve problems by using novel combinations of the trained rules. The groups performed similarly at the beginning of training, but by the final tests, performance on both application and problem-solving items was substantially better for the interleaved group, and these benefits were still evident (albeit no longer statistically significant) on the delayed retention test.

In the literature on interleaving, the materials that are the focus of instruction and practice are used as the criterion task. Thus, if students practice solving problems of a certain kind, the criterion task will involve solving different versions of that kind of problem. For this reason, the current section largely reflects the analysis of the preceding section on materials (10.2c). One remaining issue, however, concerns the degree to which the benefits of interleaved practice are maintained across time. Although the delay between practice and criterion tests for many of the studies described above was minimal, several studies have used retention intervals as long as 1 to 2 weeks. In some of these cases, interleaved practice benefited performance (e.g., Mayfield & Chase, 2002 ; Rohrer & Taylor, 2007 ), but in others, the potential benefits of interleaving did not manifest after the longer retention interval (e.g., de Croock & van Merriënboer, 2007 ; Rau et al., 2010 ). In the latter cases, interleaved practice may not have been potent at any retention interval. For instance, interleaved practice may not be potent for learning foreign-language vocabulary ( Schneider et al., 1998 ) or for students who have not received enough practice with a complex task ( de Croock & van Merriënboer, 2007 ).

Finally, interleaved practice has been shown to improve the formation of concepts about artists’ painting styles ( Kang & Pashler, 2012 ; Kornell & Bjork, 2008 ) and about bird classifications ( Wahlheim et al., 2011 ). The degree to which the benefits of interleaving improve concept formation across different kinds of concepts (and for students of different abilities) is currently unknown, but research and theory by Goldstone (1996) suggest that interleaving will not always be better. In particular, when exemplars within a category are dissimilar, blocking may be superior, because it will help learners identify what the members of a category have in common. By contrast, when exemplars from different categories are similar (as with the styles of artists and the classifications of birds used in the prior interleaving studies on concept formation), interleaving may work best because of discriminative contrast (e.g., Carvalho & Goldstone, 2011 ). These possibilities should be thoroughly explored with naturalistic materials before any general recommendations can be offered concerning the use of interleaved practice for concept formation.

The benefits of interleaved practice have been explored using a variety of cognitive tasks and materials, from the simple (e.g., paired associate learning) to the relatively complex (e.g., diagnosing failures of a complicated piece of machinery). Outcomes have been mixed. Schneider, Healy, and Bourne (1998 , 2002 ) had college students learn French vocabulary words from different categories, such as body parts, dinnerware, and foods. Across multiple studies, translation equivalents from the same category were blocked during practice or were interleaved. Immediately after practice, students who had received blocked practice recalled more translations than did students who had received interleaved practice ( Schneider et al., 2002 ). One week after practice, correct recall was essentially the same in the blocked-practice group as in the interleaved-practice group. In another study ( Schneider et al., 1998 , Experiment 2), interleaved practice led to somewhat better performance than blocked practice on a delayed test, but this benefit was largely due to a slightly lower error rate. Based on these two studies, it does not appear that interleaved practice of vocabulary boosts retention.

Concerning younger students, as reported above, Taylor and Rohrer (2010) reported that fourth graders benefited from interleaved practice when they were learning how to solve mathematical problems. In contrast, Rau et al. (2010) used various practice schedules to help teach fifth and sixth graders about fractions and found that interleaved practice did not boost performance. Finally, Olina, Reiser, Huang, Lim, and Park (2006) had high school students learn various rules for comma usage with interleaved or blocked practice; higher-skill students appeared to be hurt by interleaving (although pretests scores favored those in the blocked group, and that advantage may have carried through to the criterion test), and interleaving did not help lower-skill students.

The majority of studies on interleaved practice have included college-aged students, and across these studies, sometimes interleaved practice has boosted performance, and sometimes it has not. Even so, differences in the effectiveness of interleaved practice for this age group are likely more relevant to the kind of task employed or, perhaps, to the dosage of practice, factors that we discuss in other sections. Some studies have included college students who were learning tasks relevant to their career goals—for instance, engineering students who were learning to diagnose system failures (e.g., de Croock, van Merriënboer, & Paas, 1998 ) and medical students who were learning to interpret electrocardiograms ( Hatala, Brooks, & Norman, 2003 ). We highlight outcomes from these studies in the Materials subsection (10.2c) below. Finally, Mayfield and Chase (2002) conducted an extensive intervention to train algebra to college students with poor math skills; interleaving was largely successful, and we describe this experiment in detail in the Effects in Representative Educational Contexts subsection (10.3) below.

Consistent with this possibility are findings from Rau, Aleven, and Rummel (2010) , who used various practice schedules to help teach fifth and sixth graders about fractions. During practice, students were presented with different ways to represent fractions, such as with pie charts, line segments, and set representations. Practice was either blocked (e.g., students worked with pie charts first, then line segments, and so on), interleaved, or first blocked and then interleaved. The prepractice and postpractice criterion tests involved fractions. Increases in accuracy from the prepractice test to the postpractice test occurred only after blocked and blocked-plus-interleaved practice (students in these two groups tended to perform similarly), and then, these benefits were largely shown only for students with low prior knowledge. This outcome provides partial support for the hypothesis that interleaved practice may be most beneficial only after a certain level of competency has been achieved using blocked practice with an individual concept or problem type.

Finally, the amount of instruction and practice that students initially receive with each task may influence the degree to which interleaving all tasks enhances performance. In fact, in educational contexts, introducing a new concept or problem type (e.g., how to find the volume of a spheroid) would naturally begin with initial instruction and blocked practice with that concept or problem type, and most of the studies reported in this section involved an introduction to all tasks before interleaving began. The question is how much initial practice is enough, and whether students with low skill levels (or students learning to solve more difficult tasks) will require more practice before interleaving begins. Given that skill level and task difficulty have been shown to moderate the benefits of interleaving in the literature on motor learning (e.g., Brady, 1998 ; Wulf & Shea, 2002 ), it seems likely that they do the same for cognitive tasks. If so, the dosage of initial instruction and blocked practice should interact with the benefits of interleaving, such that more pretraining should be required for younger and less skilled students, as well as for more complex tasks.

This outcome is more consistent with the discriminative-contrast hypothesis than the retrieval-practice hypothesis. In particular, on each trial, the group receiving temporally spaced blocked practice presumably needed to retrieve (from long-term memory) what they had already learned about a painters’ style, yet doing so did not boost their performance. That is, interleaved practice encouraged students to identify the critical differences among the various artists’ styles, which in turn helped students discriminate among the artists’ paintings on the criterion test. According to this hypothesis, interleaved practice may further enhance students’ ability to develop accurate concepts (e.g., a concept of an artist’s style) when exemplars of different concepts are presented simultaneously. For instance, instead of paintings being presented separately but in an interleaved fashion, a set of paintings could be presented at the same time. In this case, a student could more readily scan the paintings of the various artists to identify differences among them. Kang and Pashler (2012) found that simultaneous presentation of paintings from different artists yielded about the same level of criterion performance (68%) as standard interleaving did (65%), and that both types of interleaved practice were superior to blocked practice (58%; for a similar finding involving students learning to classify birds, see Wahlheim, Dunlosky, & Jacoby, 2011 ).

Interleaved practice itself represents a learning condition, and it naturally covaries with distributed practice. For instance, if the practice trials for tasks of a given kind are blocked, the practice for the task is massed. By contrast, by interleaving practice across tasks of different kinds, any two instances of a task from a given set (e.g., solving for the volume of a given type of geometrical solid) would be separated by practice of instances from other tasks. Thus, at least some of the benefits of interleaved practice may reflect the benefits of distributed practice. However, some researchers have investigated the benefits of interleaved practice with spacing held constant (e.g., Kang & Pashler, 2012 ; Mitchell, Nash, & Hall, 2008 ), and the results suggested that spacing is not responsible for interleaving effects. For instance, Kang and Pashler (2012) had college students study paintings by various artists with the goal of developing a concept of each artists’ style, so that the students could later correctly identify the artists who had produced paintings that had not been presented during practice. During practice, the presentation of paintings was either blocked by artist (e.g., all paintings by Jan Blencowe were presented first, followed by all paintings by Richard Lindenberg, and so on) or interleaved. Most important, a third group received blocked practice, but instead of viewing the paintings one right after another in a massed fashion, a cartoon drawing was presented in between the presentation of each painting (the cartoons were presented so that the temporal spacing in this spaced-block-practice group was the same as that for the interleaved group). Criterion performance was best after interleaved practice and was significantly better than after either standard or temporally spaced blocked practice. No differences occurred in performance between the two blocked-practice groups, which indicates that spacing alone will not consistently benefit concept formation.

10.2 How general are the effects of interleaved practice?

How does interleaving produce these benefits? One explanation is that interleaved practice promotes organizational processing and item-specific processing because it allows students to more readily compare different kinds of problems. For instance, in Rohrer and Taylor (2007) , it is possible that when students were solving for the volume of one kind of solid (e.g., a wedge) during interleaved practice, the solution method used for the immediately prior problem involving a different kind of solid (e.g., a spheroid) was still in working memory and hence encouraged a comparison of the two problems and their different formulas. Another possible explanation is based on the distributed retrieval from long-term memory that is afforded by interleaved practice. In particular, for blocked practice, the information relevant to completing a task (whether it be a solution to a problem or memory for a set of related items) should reside in working memory; hence, participants should not have to retrieve the solution. So, if a student completes a block of problems solving for volumes of wedges, the solution to each new problem will be readily available from working memory. By contrast, for interleaved practice, when the next type of problem is presented, the solution method for it must be retrieved from long-term memory. So, if a student has just solved for the volume of a wedge and then must solve for the volume of a spheroid, he or she must retrieve the formula for spheroids from memory. Such delayed practice testing would boost memory for the retrieved information (for details, see the Practice Testing section above). This retrieval-practice hypothesis and the discriminative-contrast hypothesis are not mutually exclusive, and other mechanisms may also contribute to the benefits of interleaved practice.

Accuracy during practice was greater for students who had received blocked practice than for students who had received interleaved practice, both for partial problems (99% vs. 68%, respectively) and for full problems (98% vs. 79%). By contrast, accuracy 1 day later was substantially higher for students who had received interleaved practice (77%) than for students who had received blocked practice (38%). As with Rohrer and Taylor (2006) , a plausible explanation for this pattern is that interleaved practice helped students to discriminate between various kinds of problems and to learn the appropriate formula to apply for each one. This explanation was supported by a detailed analysis of errors the fourth graders made when solving the full problems during the criterion task. Fabrication errors involved cases in which students used a formula that was not originally trained (e.g., b × 8), whereas discrimination errors involved cases in which students used one of the four formulas that had been practiced but was not appropriate for a given problem. As shown in Figure 14 , the two groups did not differ in fabrication errors, but discrimination errors were more common after blocked practice than after interleaved practiced. Students who received interleaved practice apparently were better at discriminating among the kinds of problems and consistently applied the correct formula to each one.

One explanation for this impressive effect is that interleaving gave students practice at identifying which solution method (i.e., which of several different formulas) should be used for a given solid (see also, Mayfield & Chase, 2002 ). Put differently, interleaved practice helps students to discriminate between the different kinds of problems so that they will be more likely to use the correct solution method for each one. Compelling evidence for this possibility was provided by Taylor and Rohrer (2010) . Fourth graders learned to solve mathematical problems involving prisms. For a prism with a given number of base sides ( b ), students learned to solve for the number of faces ( b + 2), edges ( b × 3), corners ( b × 2), or angles ( b × 6). Students first practiced partial problems: A term for a single component of a prism was presented (e.g., corners), the student had to produce the correct formula (i.e., for corners, the correct response would be “ b × 2”), and then feedback (the correct answer) was provided. After practicing partial problems, students practiced full problems, in which they were shown a prism with a number of base sides (e.g., 14 sides) and a term for a single component (e.g., edges). Students had to produce the correct formula ( b × 3) and solve the problem by substituting the appropriate value of b (14 × 3). Most important, students in a blocked-practice group completed all partial- and full-practice problems for one prism feature (e.g., angles) before moving onto the next. For students in an interleaved-practice group, each block of four practice problems included one problem for each of the four prism features. One day after practice, a criterion test was administered in which students were asked to solve full problems that had not appeared during practice.

Interleaved practice, as opposed to blocked practice, is easily understood by considering a method used by Rohrer and Taylor (2007) , which involved teaching college students to compute the volumes of different geometric solids. Students had two practice sessions, which were separated by 1 week. During each practice session, students were given tutorials on how to find the volume for four different kinds of geometric solids and completed 16 practice problems (4 for each solid). After the completion of each practice problem, the correct solution was shown for 10 seconds. Students in a blocked-practice condition first read a tutorial on finding the volume of a given solid, which was immediately followed by the four practice problems for that kind of solid. Practice solving volumes for a given solid was then followed by the tutorial and practice problems for the next kind of solid, and so on. Students in an interleaved-practice group first read all four tutorials and then completed all the practice problems, with the constraint that every set of four consecutive problems included one problem for each of the four kinds of solids. One week after the second practice session, all students took a criterion test in which they solved two novel problems for each of the four kinds of solids. Students’ percentages of correct responses during the practice sessions and during the criterion test are presented in Figure 13 , which illustrates a typical interleaving effect: During practice, performance was better with blocked practice than interleaved practice, but this advantage dramatically reversed on the criterion test, such that interleaved practice boosted accuracy by 43%.

10.1 General description of interleaved practice and why it should work

Before we present evidence of the efficacy of this technique, we should point out that, in contrast to the other techniques we have reviewed in this monograph, many fewer studies have investigated the benefits of interleaved practice on measures relevant to student achievement. Nonetheless, we elected to include this technique in our review because (a) plenty of evidence indicates that interleaving can improve motor learning under some conditions (for reviews, see Brady, 1998 ; R. A. Schmidt & Bjork, 1992 ; Wulf & Shea, 2002 ) and (b) the growing literature on interleaving and performance on cognitive tasks is demonstrating the same kind of promise.

In virtually every kind of class at every grade level, students are expected to learn content from many different subtopics or problems of many different kinds. For example, students in a neuroanatomy course would learn about several different divisions of the nervous system, and students in a geometry course would learn various formulas for computing properties of objects such as surface area and volume. Given that the goal is to learn all of the material, how should a student schedule his or her studying of the different materials? An intuitive approach, and one we suspect is adopted by most students, involves blocking study or practice, such that all content from one subtopic is studied or all problems of one type are practiced before the student moves on to the next set of material. In contrast, recent research has begun to explore interleaved practice , in which students alternate their practice of different kinds of items or problems. Our focus here is on whether interleaved practice benefits students’ learning of educationally relevant material.

On the basis of the available evidence, we rate distributed practice as having high utility: It works across students of different ages, with a wide variety of materials, on the majority of standard laboratory measures, and over long delays. It is easy to implement (although it may require some training) and has been used successfully in a number of classroom studies. Although less research has examined distributed-practice effects using complex materials, the existing classroom studies have suggested that distributed practice should work for complex materials as well. Future research should examine this issue, as well as possible individual differences beyond age and criterion tasks that require higher-level cognition. Finally, future work should isolate the contributions of distributed study from those of distributed retrieval in educational contexts.

In sum, because of practical constraints and students’ potential lack of awareness of the benefits of this technique, students may need some training and some convincing that distributed practice is a good way to learn and retain information. Simply experiencing the distributed-practice effect may not always be sufficient, but a demonstration paired with instruction about the effect may be more convincing to students (e.g., Balch, 2006 ).

With regard to the issue of whether students understand the benefits of distributed practice, the data are not entirely definitive. Several laboratory studies have investigated students’ choices about whether to mass or space repeated studying of paired associates (e.g., GRE vocabulary words paired with their definitions). In such studies, students typically choose between restudying an item almost immediately after learning (massing) or restudying the item later in the same session (spacing). Although students do choose to mass their study under some conditions (e.g., Benjamin & Bird, 2006 ; Son, 2004 ), they typically choose to space their study of items ( Pyc & Dunlosky, 2010 ; Toppino, Cohen, Davis, & Moors, 2009 ). This bias toward spacing does not necessarily mean that students understand the benefits of distributed practice per se (e.g., they may put off restudying a pair because they do not want to see it again immediately), and one study has shown that students rate their overall level of learning as higher after massed study than after spaced study, even when the students had experienced the benefits of spacing (e.g., Kornell & Bjork, 2008 ). Other recent studies have provided evidence that students are unaware of the benefits of practicing with longer, as opposed to shorter, lags ( Pyc & Rawson, 2012b ; Wissman et al., 2012 ).

A second issue involves how students naturally study. Michael (1991) used the term procrastination scallop to describe the typical study pattern—namely, that time spent studying increases as an exam approaches. Mawhinney, Bostow, Laws, Blumenfield, and Hopkins (1971) documented this pattern using volunteers who agreed to study in an observation room that allowed their time spent studying to be recorded. With daily testing, students studied for a consistent amount of time across sessions. But when testing occurred only once every 3 weeks, time spent studying increased across the interval, peaking right before the exam ( Mawhinney et al., 1971 ). In other words, less frequent testing led to massed study immediately before the test, whereas daily testing effectively led to study that was distributed over time. The implication is that students will not necessarily engage in distributed study unless the situation forces them to do so; it is unclear whether this is because of practical constraints or because students do not understand the memorial benefits of distributed practice.

Several obstacles may arise when implementing distributed practice in the classroom. Dempster and Farris (1990) made the interesting point that many textbooks do not encourage distributed learning, in that they lump related material together and do not review previously covered material in subsequent units. At least one formal content analysis of actual textbooks (specifically, elementary-school mathematics textbooks; Stigler, Fuson, Ham, & Kim, 1986 ) supported this claim, showing that American textbooks grouped to-be-worked problems together (presumably at the end of chapters) as opposed to distributing them throughout the pages. These textbooks also contained less variability in sets of problems than did comparable textbooks from the former Soviet Union. Thus, one issue students face is that their study materials may not be set up in a way that encourages distributed practice.

Another study examined learning of statistics across two sections of the same course, one of which was taught over a 6-month period and the other of which covered the same material in an 8-week period ( Budé, Imbos, van de Wiel, & Berger, 2011 ). The authors took advantage of a curriculum change at their university that allowed them to compare learning in a class taught before the university reduced the length of the course with learning in a class taught after the change. The curriculum change meant that lectures, problem-based group meetings, and lab sessions (as well as student-driven study, assignments, etc.) were implemented within a much shorter time period; in other words, a variety of study and retrieval activities were more spaced out in time in one class than in the other. Students whose course lasted 6 months outperformed students in the 8-week course both on an open-ended test tapping conceptual understanding (see Fig. 11 ) and on the final exam ( Fig. 12 ). Critically, the two groups performed similarly on a control exam from another course ( Fig. 12 ), suggesting that the effects of distributed practice were not due to ability differences across classes.

Most of the classroom studies that have demonstrated distributed-practice effects have involved spacing of more than just study opportunities. It is not surprising that real classroom exercises would use a variety of techniques, given that the goal of educators is to maximize learning rather than to isolate the contributions of individual techniques. Consider a study by Sobel, Cepeda, and Kapler (2011) in which fifth graders learned vocabulary words. Each learning session had multiple steps: A teacher read and defined words; the students wrote down the definitions; the teacher repeated the definitions and used them in sentences, and students reread the definitions; finally, the students wrote down the definitions again and created sentences using the words. Several different kinds of study (including reading from booklets and overheads, as well as teacher instruction) and practice tests (e.g., generating definitions and sentences) were spaced in this research. The criterion test was administered 5 weeks after the second learning session, and students successfully defined a greater proportion of GRE vocabulary words (e.g., accolade ) learned in sessions spaced a week apart than vocabulary words learned in sessions spaced a minute apart ( Sobel et al., 2011 ). A mix of teacher instruction and student practice was also involved in a demonstration of the benefits of distributed practice for learning phonics in first graders ( Seabrook, Brown, & Solity, 2005 ).

Much research has established the durability of distributed-practice effects over time, but much less attention has been devoted to other kinds of criterion tasks used in educational contexts. The Cepeda et al. (2009) meta-analysis, for example, focused on studies in which the dependent measure was verbal free recall. The distributed-practice effect has been generalized to dependent measures beyond free recall, including multiple-choice questions, cued-recall and short-answer questions (e.g., Reynolds & Glaser, 1964 ), frequency judgments (e.g., Hintzman & Rogers, 1973 ), and, sometimes, implicit memory (e.g., R. L. Greene, 1990 ; Jacoby & Dallas, 1981 ). More generally, although studies using these basic measures of memory can inform the field by advancing theory, the effects of distributed practice on these measures will not necessarily generalize to all other educationally relevant measures. Given that students are often expected to go beyond the basic retention of materials, this gap is perhaps the largest and most important to fill for the literature on distributed practice. With that said, some relevant data from classroom studies are available; we turn to these in the next section.

We alluded earlier to the fact that distributed-practice effects are robust over long retention intervals, with Cepeda and colleagues (2008) arguing that the ideal lag between practice sessions would be approximately 10–20% of the desired retention interval. They examined learning up to 350 days after study; other studies have shown benefits of distributed testing after intervals lasting for months (e.g., Cepeda et al., 2009 ) and even years (e.g., Bahrick et al., 1993 ; Bahrick & Phelps, 1987 ). In fact, the distributed-practice effect is often stronger on delayed tests than immediate ones, with massed practice (cramming) actually benefitting performance on immediate tests (e.g., Rawson & Kintsch, 2005 ).

Not all tasks yield comparably large distributed-practice effects. For instance, distributed-practice effects are large for free recall but are smaller (or even nonexistent) for tasks that are very complex, such as airplane control ( Donovan & Radosevich, 1999 ). It is not clear how to map these kinds of complex tasks, which tend to have a large motor component, onto the types of complex tasks seen in education. The U.S. Institute of Education Sciences guide on organizing study to improve learning explicitly notes that “one limitation of the literature is that few studies have examined acquisition of complex bodies of structured information” ( Pashler et al., 2007 , p. 6). The data that exist (which are reviewed below) have come from classroom studies and are promising.

At the other end of the life span, older adults learning paired associates benefit from distributed practice as much as young adults do (e.g., Balota, Duchek, & Paullin, 1989 ). Similar conclusions are reached when spacing involves practice tests rather than study opportunities (e.g., Balota et al., 2006 ; Logan & Balota, 2008 ) and when older adults are learning to classify exemplars of a category (as opposed to paired associates; Kornell, Castel, Eich, & Bjork, 2010 ). In summary, learners of different ages benefit from distributed practice, but an open issue is the degree to which the distributed-practice effect may be moderated by other individual characteristics, such as prior knowledge and motivation.

Finally, the distributed-practice effect may depend on the type of processing evoked across learning episodes. In the meta-analysis by Janiszewski et al. (2003) , intentional processing was associated with a larger effect size ( M = .35) than was incidental processing ( M = .24). Several things should be noted. First, the distributed-practice effect is sometimes observed with incidental processing (e.g., R. L. Greene, 1989 ; Toppino, Fearnow-Kenney, Kiepert, & Teremula, 2009 ); it is not eliminated across the board, but the average effect size is slightly (albeit significantly) smaller. Second, the type of processing learners engage in may covary with the intentionality of their learning, with students being more likely to extract meaning from materials when they are deliberately trying to learn them. In at least two studies, deeper processing yielded a distributed-practice effect whereas more shallow processing did not (e.g., Challis, 1993 ; Delaney & Knowles, 2005 ). Whereas understanding how distributed-practice effects change with strategy has important theoretical implications, this issue is less important when considering applications to education, because when students are studying, they presumably are intentionally trying to learn.

However, the answer is not as simple as “longer lags are better”—the answer depends on how long the learner wants to retain information. Impressive data come from Cepeda, Vul, Rohrer, Wixted, and Pashler (2008) , who examined people’s learning of trivia facts in an internet study that had 26 different conditions, which combined different between-session intervals (from no lag to a lag of 105 days) with different retention intervals (up to 350 days). In brief, criterion performance was best when the lag between sessions was approximately 10–20% of the desired retention interval. For example, to remember something for 1 week, learning episodes should be spaced 12 to 24 hours apart; to remember something for 5 years, the learning episodes should be spaced 6 to 12 months apart. Of course, when students are preparing for examinations, the degree to which they can space their study sessions may be limited, but the longest intervals (e.g., intervals of 1 month or more) may be ideal for studying core content that needs to be retained for cumulative examinations or achievement tests that assess the knowledge students have gained across several years of education.

One of the most important questions about distributed practice involves how to space the learning episodes—that is, how should the multiple encoding opportunities be arranged? Cepeda et al. (2006) noted that most studies have used relatively short intervals (less than 1 day), whereas we would expect the typical interval between educational learning opportunities (e.g., lecture and studying) to be longer. Recall that the classic investigation by Bahrick (1979) showed a larger distributed-practice effect with 30-day lags between sessions than with 1-day lags ( Fig. 10 ); Cepeda et al. (2006) noted that “every study examined here with a retention interval longer than 1 month demonstrated a benefit from distribution of learning across weeks or months” (p. 370; “retention interval” here refers to the time between the last study opportunity and the final test).

Distributed practice refers to a particular schedule of learning episodes, as opposed to a particular kind of learning episode. That is, the distributed-practice effect refers to better learning when learning episodes are spread out in time than when they occur in close succession, but those learning episodes could involve restudying material, retrieving information from memory, or practicing skills. Because our emphasis is on educational applications, we will not draw heavily on the skill literature, given that tasks such as ball tossing, gymnastics, and music memorization are less relevant to our purposes. Because much theory on the distributed-practice effect is derived from research on the spacing of study episodes, we focus on that research, but we also discuss relevant studies on distributed retrieval practice. In general, distributed practice testing is better than distributed study (e.g., Carpenter et al., 2009 ), as would be expected from the large literature on the benefits of practice testing.

The distributed-practice effect is robust. Cepeda et al. (2006) reviewed 254 studies involving more than 14,000 participants altogether; overall, students recalled more after spaced study (47%) than after massed study (37%). In both Donovan and Radosevich’s (1999) and Janiszewski et al.’s (2003) meta-analyses, distributed practice was associated with moderate effect sizes for recall of verbal stimuli. As we describe below, the distributed-practice effect generalizes across many of the categories of variables listed in Table 2 .

9.2 How general are the effects of distributed practice?

Many theories of distributed-practice effects have been proposed and tested. Consider some of the accounts currently under debate (for in-depth reviews, see Benjamin & Tullis, 2010 ; Cepeda et al., 2006 ). One theory invokes the idea of deficient processing, arguing that the processing of material during a second learning opportunity suffers when it is close in time to the original learning episode. Basically, students do not have to work very hard to reread notes or retrieve something from memory when they have just completed this same activity, and furthermore, they may be misled by the ease of this second task and think they know the material better than they really do (e.g., Bahrick & Hall, 2005 ). Another theory involves reminding; namely, the second presentation of to-be-learned material serves to remind the learner of the first learning opportunity, leading it to be retrieved, a process well known to enhance memory (see the Practice Testing section above). Some researchers also draw on consolidation in their explanations, positing that the second learning episode benefits from any consolidation of the first trace that has already happened. Given the relatively large magnitude of distributed-practice effects, it is plausible that multiple mechanisms may contribute to them; hence, particular theories often invoke different combinations of mechanisms to explain the effects.

To illustrate the issues involved, we begin with a description of a classic experiment on distributed practice, in which students learned translations of Spanish words to criterion in an original session ( Bahrick, 1979 ). Students then participated in six additional sessions in which they had the chance to retrieve and relearn the translations (feedback was provided). Figure 10 presents results from this study. In the zero-spacing condition (represented by the circles in Fig. 10 ), the learning sessions were back-to-back, and learning was rapid across the six massed sessions. In the 1-day condition (represented by the squares in Fig. 10 ), learning sessions were spaced 1 day apart, resulting in slightly more forgetting across sessions (i.e., lower performance on the initial test in each session) than in the zero-spacing condition, but students in the 1-day condition still obtained almost perfect accuracy by the sixth session. In contrast, when learning sessions were separated by 30 days, forgetting was much greater across sessions, and initial test performance did not reach the level observed in the other two conditions, even after six sessions (see triangles in Fig. 10 ). The key point for our present purposes is that the pattern reversed on the final test 30 days later, such that the best retention of the translations was observed in the condition in which relearning sessions had been separated by 30 days. That is, the condition with the most intersession forgetting yielded the greatest long-term retention. Spaced practice (1 day or 30 days) was superior to massed practice (0 days), and the benefit was greater following a longer lag (30 days) than a shorter lag (1 day).

9.1 General description of distributed practice and why it should work

To-be-learned material is often encountered on more than one occasion, such as when students review their notes and then later use flashcards to restudy the materials, or when a topic is covered in class and then later studied in a textbook. Even so, students mass much of their study prior to tests and believe that this popular cramming strategy is effective. Although cramming is better than not studying at all in the short term, given the same amount of time for study, would students be better off spreading out their study of content? The answer to this question is a resounding “yes.” The term distributed- practice effect refers to the finding that distributing learning over time (either within a single study session or across sessions) typically benefits long-term retention more than does massing learning opportunities back-to-back or in relatively close succession.

On the basis of the evidence described above, we rate practice testing as having high utility. Testing effects have been demonstrated across an impressive range of practice-test formats, kinds of material, learner ages, outcome measures, and retention intervals. Thus, practice testing has broad applicability. Practice testing is not particularly time intensive relative to other techniques, and it can be implemented with minimal training. Finally, several studies have provided evidence for the efficacy of practice testing in representative educational contexts. Regarding recommendations for future research, one gap identified in the literature concerns the extent to which the benefits of practice testing depend on learners’ characteristics, such as prior knowledge or ability. Exploring individual differences in testing effects would align well with the aim to identify the broader generalizability of the benefits of practice testing. Moreover, research aimed at more thoroughly identify the causes of practice-test effects may provide further insights into maximizing these effects.

Finally, although we have focused on students’ use of practice testing, in keeping with the purpose of this monograph, we briefly note that instructors can also support student learning by increasing the use of low-stakes or no-stakes practice testing in the classroom. Several studies have also reported positive outcomes from administering summative assessments that are shorter and more frequent rather than longer and less frequent (e.g., one exam per week rather than only two or three exams per semester), not only for learning outcomes but also on students’ ratings of factors such as course satisfaction and preference for more frequent testing (e.g., Keys, 1934 ; Kika, McLaughlin, & Dixon, 1992 ; Leeming, 2002 ; for a review, see Bangert-Drowns, Kulik, & Kulik, 1991 ).

Another reason to recommend the implementation of feedback with practice testing is that it protects against perseveration errors when students respond incorrectly on a practice test. For example, Butler and Roediger (2008) found that a multiple-choice practice test increased intrusions of false alternatives on a final cued-recall test when no feedback was provided, whereas no such increase was observed when feedback was given. Fortunately, the corrective effect of feedback does not require that it be presented immediately after the practice test. Metcalfe et al. (2009) found that final-test performance for initially incorrect responses was actually better when feedback had been delayed than when it had been immediate. Also encouraging is evidence suggesting that feedback is particularly effective for correcting high-confidence errors (e.g., Butterfield & Metcalfe, 2001 ). Finally, we note that the effects of practice-test errors on subsequent performance tend to be relatively small, often do not obtain, and are heavily outweighed by the positive benefits of testing (e.g., Fazio et al., 2010 ; Kang, Pashler, et al., 2011 ; Roediger & Marsh, 2005 ). Thus, potential concerns about errors do not constitute a serious issue for implementation, particularly when feedback is provided.

Concerning the effectiveness of practice testing relative to other learning techniques, a few studies have shown benefits of practice testing over concept mapping, note-taking, and imagery use ( Fritz et al., 2007 ; Karpicke & Blunt, 2011 ; McDaniel et al., 2009 ; Neuschatz, Preston, Toglia, & Neuschatz, 2005 ), but the most frequent comparisons have involved pitting practice testing against unguided restudy. The modal outcome is that practice testing outperforms restudying, although this effect depends somewhat on the extent to which practice tests are accompanied by feedback involving presentation of the correct answer. Although many studies have shown that testing alone outperforms restudy, some studies have failed to find this advantage (in most of these cases, accuracy on the practice test has been relatively low). In contrast, the advantage of practice testing with feedback over restudy is extremely robust. Practice testing with feedback also consistently outperforms practice testing alone.

Practice testing appears to be relatively reasonable with respect to time demands. Most research has shown effects of practice testing when the amount of time allotted for practice testing is modest and is equated with the time allotted for restudying. Another merit of practice testing is that it can be implemented with minimal training. Students can engage in recall-based self-testing in a relatively straightforward fashion. For example, students can self-test via cued recall by creating flashcards (free and low-cost flashcard software is also readily available) or by using the Cornell note-taking system (which involves leaving a blank column when taking notes in class and entering key terms or questions in it shortly after taking notes to use for self-testing when reviewing notes at a later time; for more details, see Pauk & Ross, 2010 ). More structured forms of practice testing (e.g., multiple-choice, short-answer, and fill-in-the-blank tests) are often readily available to students via practice problems or questions included at the end of textbook chapters or in the electronic supplemental materials that accompany many textbooks. With that said, students would likely benefit from some basic instruction on how to most effectively use practice tests, given that the benefits of testing depend on the kind of test, dosage, and timing. As described above, practice testing is particularly advantageous when it involves retrieval and is continued until items are answered correctly more than once within and across practice sessions, and with longer as opposed to shorter intervals between trials or sessions.

For example, a study by McDaniel et al. (2012) involved undergraduates enrolled in an online psychology course on the brain and behavior. Each week, students could earn course points by completing an online practice activity up to four times. In the online activity, some information was presented for practice testing with feedback, some information was presented for restudy, and some information was not presented. Subsequent unit exams included questions that had been presented during the practice tests and also new, related questions focusing on different aspects of the practiced concepts. As shown in Figure 9 , grades on unit exams were higher for information that had been practice tested than for restudied information or unpracticed information, for both repeated questions and for new related questions.

Finally, recent studies have also shown testing effects involving other forms of transfer. Jacoby et al. (2010) presented learners with pictures of birds and their family names for initial study, which was followed either by additional study of the picture-name pairs or by practice tests in which learners were shown each picture and attempted to retrieve the appropriate family name prior to being shown the correct answer. The subsequent criterion test involved the same families of birds but included new pictures of birds from those families. Learners were more accurate in classifying new birds following practice testing than following restudy only. Similarly, Kang, McDaniel, & Pashler (2011) examined inductive function learning under conditions in which learners either studied pairs of input-output values or predicted output for a given input value prior to being shown the correct output. The prediction group outperformed the study-only group on a criterion test for both trained pairs and untrained extrapolation pairs.

Although cued recall is the most commonly used criterion measure, testing effects have also been shown with other forms of memory tests, including free-recall, recognition, and fill-in-the-blank tests, as well as short-answer and multiple-choice questions that tap memory for information explicitly stated in text material.

An increasing number of studies have shown benefits for learning from text materials of various lengths (from 160 words to 2,000 words or more), of various text genres (e.g., encyclopedia entries, scientific journal articles, textbook passages), and on a wide range of topics (e.g., Civil War economics, bat echolocation, sea otters, the big bang theory, fossils, Arctic exploration, toucans). Practice tests have improved learning from video lectures and from narrated animations on topics such as adult development, lightning, neuroanatomy, and art history ( Butler & Roediger, 2007 ; Cranney et al., 2009 ; Vojdanoska, Cranney, & Newell, 2010 ).

Many of the studies that have demonstrated testing effects have involved relatively simple verbal materials, including word lists and paired associates. However, most of the sets of materials used have had some educational relevance. A sizable majority of studies using paired-associate materials have included foreign-language translations (including Chinese, Iñupiaq, Japanese, Lithuanian, Spanish, and Swahili) or vocabulary words paired with synonyms. Other studies have extended effects to paired book titles and author names, names and faces, objects and names, and pictures and foreign-language translations (e.g., Barcroft, 2007 ; Carpenter & Vul, 2011 ; Morris & Fritz, 2002 ; Rohrer, 2009 ).

Finally, evidence from studies involving patient populations is at least suggestive with respect to the generality of testing effects across different levels of learning capacity. For example, Balota et al. (2006) found that spaced practice tests improved retention over short time intervals not only for younger adults and healthy older adults but also for older adults with Alzheimer’s disease. Similarly, Sumowski et al. (2010) found that a practice test produced larger testing effects for memory-impaired, versus memory-intact, subsets of middle-aged individuals with multiple sclerosis ( d = 0.95 vs. d = 0.54, respectively, with grouping based on performance on a baseline measure of memory). In sum, several studies have suggested that practice testing may benefit individuals with varying levels of knowledge or ability, but the extent to which the magnitude of the benefit depends on these factors remains an open question.

Likewise, minimal research has examined testing effects as a function of academically relevant ability levels. In a study by Spitzer (1939) , 3,605 sixth graders from 91 different elementary schools read a short text and took an immediate test, to provide a baseline measure of reading comprehension ability. In the groups of interest here, all students read an experimental text, half completed a practice multiple-choice test, and then all completed a multiple-choice test either 1 or 7 days later. Spitzer reported final-test performance for the experimental text separately for the top and bottom thirds of performers on the baseline measure. As shown in Figure 7 , taking the practice test benefited both groups of students. With that said, the testing effect appeared to be somewhat larger for higher-ability readers than for lower-ability readers (with approximately 20%, vs. 12%, improvements in accuracy), although Spitzer did not report the relevant inferential statistics.

In contrast to the relatively broad range of ages covered in the testing-effect literature, surprisingly minimal research has examined testing effects as a function of individual differences in knowledge or ability. In the only study including groups of learners with different knowledge levels, Carroll, Campbell-Ratcliffe, Murnane, and Perfect (2007) presented first-year undergraduates and advanced psychology majors with two passages from an abnormal-psychology textbook. Students completed a short-answer practice test on one of the passages and then took a final test over both passages either 15 minutes or 1 day later. Both groups showed similar testing effects at both time points (with 33% and 38% better accuracy, respectively, on the material that had been practice tested relative to the material that had not). Although these initial results provide encouraging evidence that testing effects may be robust across knowledge levels, further work is needed before strong conclusions can be drawn about the extent to which knowledge level moderates testing effects.

Although various practice-test formats work, some work better than others. Glover (1989) presented students with a short expository text for initial study and then manipulated the format of the practice test (free recall, fill in the blank, or recognition) and the format of the final test (free recall, fill in the blank, or recognition). On all three final-test formats, performance was greater following free-recall practice than following fill-in-the-blank practice, which in turn was greater than performance following recognition practice. Similarly, Carpenter and DeLosh (2006) found that free-recall practice outperformed cued-recall and recognition practice regardless of whether the final test was in a free-recall, cued-recall, or recognition format, and Hinze and Wiley (2011) found that performance on a multiple-choice final test was better following cued recall of paragraphs than following fill-in-the-blank practice. Further work is needed to support strong prescriptive conclusions, but the available evidence suggests that practice tests that require more generative responses (e.g., recall or short answer) are more effective than practice tests that require less generative responses (e.g., fill in the blank or recognition).

Given the volume of research on testing effects, an exhaustive review of the literature is beyond the scope of this article. Accordingly, our synthesis below is primarily based on studies from the past 10 years (which include more than 120 articles), which we believe represent the current state of the field. Most of these studies compared conditions involving practice tests with conditions not involving practice tests or involving only restudy; however, we also considered more recent work pitting different practice-testing conditions against one another to explore when practice testing works best.

8.2 How general are the effects of practice testing?

Recent evidence also suggests that practice testing may enhance how well students mentally organize information and how well they process idiosyncratic aspects of individual items, which together can support better retention and test performance ( Hunt, 1995 , 2006 ). Zaromb and Roediger (2010) presented learners with lists consisting of words from different taxonomic categories (e.g., vegetables, clothing) either for eight blocks of study trials or for four blocks of study trials with each trial followed by a practice free-recall test. Replicating basic testing effects, final free recall 2 days later was greater when items had received practice tests (39%) than when they had only been studied (17%). Importantly, the practice test condition also outperformed the study condition on secondary measures primarily tapping organizational processing and idiosyncratic processing.

Concerning mediated effects of practice testing, Pyc and Rawson (2010 , 2012b ) proposed a similar account, according to which practice testing facilitates the encoding of more effective mediators (i.e., elaborative information connecting cues and targets) during subsequent restudy opportunities. Pyc and Rawson (2010) presented learners with Swahili-English translations in an initial study block, which was followed by three blocks of restudy trials; for half of the participants, each restudy trial was preceded by practice cued recall. All learners were prompted to generate and report a keyword mediator during each restudy trial. When tested 1 week later, compared with students who had only restudied, students who had engaged in practice cued recall were more likely to recall their mediators when prompted with the cue word and were more likely to recall the target when prompted with their mediator.

Concerning direct effects of practice testing, Carpenter (2009) recently proposed that testing can enhance retention by triggering elaborative retrieval processes. Attempting to retrieve target information involves a search of long-term memory that activates related information, and this activated information may then be encoded along with the retrieved target, forming an elaborated trace that affords multiple pathways to facilitate later access to that information. In support of this account, Carpenter (2011) had learners study weakly related word pairs (e.g., “mother”–“child”) followed either by additional study or a practice cued-recall test. On a later final test, recall of the target word was prompted via a previously unpresented but strongly related word (e.g., “father”). Performance was greater following a practice test than following restudy, presumably because the practice test increased the likelihood that the related information was activated and encoded along with the target during learning.

Why does practice testing improve learning? Whereas a wealth of studies have established the generality of testing effects, theories about why it improves learning have lagged behind. Nonetheless, theoretical accounts are increasingly emerging to explain two different kinds of testing effects, which are referred to as direct effects and mediated effects of testing ( Roediger & Karpicke, 2006a ). Direct effects refer to changes in learning that arise from the act of taking a test itself, whereas mediated effects refer to changes in learning that arise from an influence of testing on the amount or kind of encoding that takes place after the test (e.g., during a subsequent restudy opportunity).

As an illustrative example of the power of testing, Runquist (1983) presented undergraduates with a list of word pairs for initial study. After a brief interval during which participants completed filler tasks, half of the pairs were tested via cued recall and half were not. Participants completed a final cued-recall test for all pairs either 10 minutes or 1 week later. Final-test performance was better for pairs that were practice tested than pairs that were not (53% versus 36% after 10 minutes, 35% versus 4% after 1 week). Whereas this study illustrates the method of comparing performance between conditions that do and do not involve a practice test, many other studies have compared a practice-testing condition with more stringent conditions involving additional presentations of the to-be-learned information. For example, Roediger and Karpicke (2006b) presented undergraduates with a short expository text for initial study followed either by a second study trial or by a practice free-recall test. One week later, free recall was considerably better among the group that had taken the practice test than among the group that had restudied (56% versus 42%). As another particularly compelling demonstration of the potency of testing as compared with restudy, Karpicke and Roediger (2008) presented undergraduates with Swahili-English translations for cycles of study and practice cued recall until items were correctly recalled once. After the first correct recall, items were presented only in subsequent study cycles with no further testing, or only in subsequent test cycles with no further study. Performance on a final test 1 week later was substantially greater after continued testing (80%) than after continued study (36%).

8.1 General description of practice testing and why it should work

Note that we use the term practice testing here (a) to distinguish testing that is completed as a low-stakes or no-stakes practice or learning activity outside of class from summative assessments that are administered by an instructor in class, and (b) to encompass any form of practice testing that students would be able to engage in on their own. For example, practice testing could involve practicing recall of target information via the use of actual or virtual flashcards, completing practice problems or questions included at the end of textbook chapters, or completing practice tests included in the electronic supplemental materials that increasingly accompany textbooks.

Testing is likely viewed by many students as an undesirable necessity of education, and we suspect that most students would prefer to take as few tests as possible. This view of testing is understandable, given that most students’ experience with testing involves high-stakes summative assessments that are administered to evaluate learning. This view of testing is also unfortunate, because it overshadows the fact that testing also improves learning. Since the seminal study by Abbott (1909), more than 100 years of research has yielded several hundred experiments showing that practice testing enhances learning and retention (for recent reviews, see Rawson & Dunlosky, 2011 ; Roediger & Butler, 2011 ; Roediger, Putnam, & Smith, 2011 ). Even in 1906 , Edward Thorndike recommended that “the active recall of a fact from within is, as a rule, better than its impression from without” (p. 123, Thorndike, 1906 ). The century of research on practice testing since then has supported Thorndike’s recommendation by demonstrating the broad generalizability of the benefits of practice testing.

Based on the available evidence, we rate rereading as having low utility. Although benefits from rereading have been shown across a relatively wide range of text materials, the generality of rereading effects across the other categories of variables in Table 2 has not been well established. Almost no research on rereading has involved learners younger than college-age students, and an insufficient amount of research has systematically examined the extent to which rereading effects depend on other student characteristics, such as knowledge or ability. Concerning criterion tasks, the effects of rereading do appear to be durable across at least modest delays when rereading is spaced. However, most effects have been shown with recall-based memory measures, whereas the benefit for comprehension is less clear. Finally, although rereading is relatively economical with respect to time demands and training requirements when compared with some other learning techniques, rereading is also typically much less effective. The relative disadvantage of rereading to other techniques is the largest strike against rereading and is the factor that weighed most heavily in our decision to assign it a rating of low utility.

One advantage of rereading is that students require no training to use it, other than perhaps being instructed that rereading is generally most effective when completed after a moderate delay rather than immediately after an initial reading. Additionally, relative to some other learning techniques, rereading is relatively economical with respect to time demands (e.g., in those studies permitting self-paced study, the amount of time spent rereading has typically been less than the amount of time spent during initial reading). However, in head-to-head comparisons of learning techniques, rereading has not fared well against some of the more effective techniques discussed here. For example, direct comparisons of rereading to elaborative interrogation, self-explanation, and practice testing (described in the Practice Testing section below) have consistently shown rereading to be an inferior technique for promoting learning.

Given that rereading is the study technique that students most commonly report using, it is perhaps ironic that no experimental research has assessed its impact on learning in educational contexts. Although many of the topics of the expository texts used in rereading research are arguably similar to those that students might encounter in a course, none of the aforementioned studies have involved materials taken from actual course content. Furthermore, none of the studies were administered in the context of a course, nor have any of the outcome measures involved course-related tests. The only available evidence involves correlational findings reported in survey studies, and it is mixed. Carrier (2003) found a nonsignificant negative association between self-reported rereading of textbook chapters and exam performance but a significantly positive association between self-reported review of lecture notes and exam performance. Hartwig and Dunlosky (2012) found a small but significant positive association between self-reported rereading of textbook chapters or notes and self-reported grade point average, even after controlling for self-reported use of other techniques.

Fewer studies have involved spaced rereading, although a relatively consistent advantage for spaced rereading over a single reading has been shown both on immediate tests and on tests administered after a 2-day delay. Regarding the comparison of massed rereading with spaced rereading, neither schedule shows a consistent advantage on immediate tests. A similar number of studies have shown an advantage of spacing over massing, an advantage of massing over spacing, and no differences in performance. In contrast, spaced rereading consistently outperforms massed rereading on delayed tests. We explore the benefits of spacing more generally in the Distributed Practice section below.

Rereading effects are robust across variations in the length and content of text material. Although most studies have used expository texts, rereading effects have also been shown for narratives. Those studies involving expository text material have used passages of considerably varying lengths, including short passages (e.g., 99–125 words), intermediate passages (e.g., 390–750 words), lengthy passages (e.g., 900–1,500 words), and textbook chapters or magazine articles with several thousand words. Additionally, a broad range of content domains and topics have been covered—an illustrative but nonexhaustive list includes physics (e.g., Ohm’s law), law (e.g., legal principles of evidence), history (e.g., the construction of the Brooklyn Bridge), technology (e.g., how a camera exposure meter works), biology (e.g., insects), geography (e.g., of Africa), and psychology (e.g., the treatment of mental disorders).

Similarly, few studies have examined rereading effects as a function of ability, and the available evidence is somewhat mixed. Arnold (1942) found an advantage of massed rereading over outlining or summarizing a passage for the same amount of time among learners with both higher and lower levels of intelligence and both higher and lower levels of reading ability (but see Callender & McDaniel, 2009 , who did not find an effect of massed rereading over single reading for either higher- or lower-ability readers). Raney (1993) reported a similar advantage of massed rereading over a single reading for readers with either higher or lower working-memory spans. Finally, Barnett and Seefeldt (1989) defined high- and low-ability groups by a median split of ACT scores; both groups showed an advantage of massed rereading over a single reading for short-answer factual questions, but only high-ability learners showed an effect for questions that required application of the information.

The extent to which rereading effects depend on knowledge level is also woefully underexplored. In the only study to date that has provided any evidence about the extent to which knowledge may moderate rereading effects ( Arnold, 1942 ), both high-knowledge and low-knowledge readers showed an advantage of massed rereading over outlining or summarizing a passage for the same amount of time. Additional suggestive evidence that relevant background knowledge is not requisite for rereading effects has come from three recent studies that used the same text ( Rawson, 2012 ; Rawson & Kintsch, 2005 ; Verkoeijen et al., 2008 ) and found significant rereading effects for learners with virtually no specific prior knowledge about the main topics of the text (the charge of the Light Brigade in the Crimean War and the Hollywood film portraying the event).

The extant literature is severely limited with respect to establishing the generality of rereading effects across different groups of learners. To our knowledge, all but two studies of rereading effects have involved undergraduate students. Concerning the two exceptions, Amlund, Kardash, and Kulhavy (1986) reported rereading effects with graduate students, and O’Shea, Sindelar, and O’Shea (1985) reported effects with third graders.

Finally, although learners in most experiments have studied only one text, rereading effects have also been shown when learners are asked to study several texts, providing suggestive evidence that rereading effects can withstand interference from other learning materials.

One other learning condition that merits mention is amount of practice, or dosage. Most of the benefits of rereading over a single reading appear to accrue from the second reading: The majority of studies that have involved two levels of rereading have shown diminishing returns from additional rereading trials. However, an important caveat is that all of these studies involved massed rereading. The extent to which additional spaced rereading trials produce meaningful gains in learning remains an open question.

One aspect of the learning conditions that does significantly moderate the effects of rereading concerns the lag between initial reading and rereading. Although advantages of rereading over reading only once have been shown with massed rereading and with spaced rereading (in which some amount of time passes or intervening material is presented between initial study and restudy), spaced rereading usually outperforms massed rereading. However, the relative advantage of spaced reading over massed rereading may be moderated by the length of the retention interval, an issue that we discuss further in the subsection on criterion tasks below (7.2d). The effect of spaced rereading may also depend on the length of the lag between initial study and restudy. In a recent study by Verkoeijen, Rikers, and Özsoy (2008) , learners read a lengthy expository text and then reread it immediately afterward, 4 days later, or 3.5 weeks later. Two days after rereading, all participants completed a final test. Performance was greater for the group who reread after a 4-day lag than for the massed rereaders, whereas performance for the group who reread after a 3.5-week lag was intermediate and did not significantly differ from performance in either of the other two groups. With that said, spaced rereading appears to be effective at least across moderate lags, with studies reporting significant effects after lags of several minutes, 15–30 minutes, 2 days, and 1 week.

Following the early work of Rothkopf (1968) , subsequent research established that the effects of rereading are fairly robust across other variations in learning conditions. For example, rereading effects obtain regardless of whether learners are forewarned that they will be given the opportunity to study more than once, although Barnett and Seefeldt (1989) found a small but significant increase in the magnitude of the rereading effect among learners who were forewarned, relative to learners who were not forewarned. Furthermore, rereading effects obtain with both self-paced reading and experimenter-paced presentation. Although most studies have involved the silent reading of written material, effects of repeated presentations have also been shown when learners listen to an auditory presentation of text material (e.g., Bromage & Mayer, 1986 ; Mayer, 1983 ). 2

7.2 How general are the effects of rereading?

Why does rereading improve learning? Mayer (1983 ; Bromage & Mayer, 1986 ) outlined two basic accounts of rereading effects. According to the quantitative hypothesis , rereading simply increases the total amount of information encoded, regardless of the kind or level of information within the text. In contrast, the qualitative hypothesis assumes that rereading differentially affects the processing of higher-level and lower-level information within a text, with particular emphasis placed on the conceptual organization and processing of main ideas during rereading. To evaluate these hypotheses, several studies have examined free recall as a function of the kind or level of text information. The results have been somewhat mixed, but the evidence appears to favor the qualitative hypothesis. Although a few studies found that rereading produced similar improvements in the recall of main ideas and of details (a finding consistent with the quantitative hypothesis), several studies have reported greater improvement in the recall of main ideas than in the recall of details (e.g., Bromage & Mayer, 1986 ; Kiewra, Mayer, Christensen, Kim, & Risch, 1991 ; Rawson & Kintsch, 2005 ).

In an early study by Rothkopf (1968) , undergraduates read an expository text (either a 1,500-word passage about making leather or a 750-word passage about Australian history) zero, one, two, or four times. Reading was self-paced, and rereading was massed (i.e., each presentation of a text occurred immediately after the previous presentation). After a 10-minute delay, a cloze test was administered in which 10% of the content words were deleted from the text and students were to fill in the missing words. As shown in Figure 6 , performance improved as a function of number of readings.

7.1 General description of rereading and why it should work

Imagery can improve students’ learning of text materials, and the promising work by Leutner et al. (2009) speaks to the potential utility of imagery use for text learning. Imagery production is also more broadly applicable than the keyword mnemonic. Nevertheless, the benefits of imagery are largely constrained to imagery-friendly materials and to tests of memory, and further demonstrations of the effectiveness of the technique (across different criterion tests and educationally relevant retention intervals) are needed. Accordingly, we rated the use of imagery for learning text as low utility.

6.5 Imagery use for learning text: Overall assessment

The majority of studies have examined the influence of imagery by using relatively brief instructions that encouraged students to generate images of text content while studying. Given that imagery does not appear to undermine learning (and that it does boost performance in some conditions), teachers may consider instructing students (third grade and above) to attempt to use imagery when they are reading texts that easily lend themselves to imaginal representations. How much training would be required to ensure that students consistently and effectively use imagery under the appropriate conditions is unknown.

Many of the studies on imagery use and text learning have involved students from real classrooms who were reading texts that were written to match the students’ grade level. Most studies have used fabricated materials, and few studies have used authentic texts that students would read. Exceptions have involved the use of a science text on the dipole character of water molecules ( Leutner et al., 2009 ) and texts on cause-effect relationships that were taken from real science and social-science textbooks ( Gagne & Memory, 1978 ); in both cases, imagery instructions improved test performance (although the benefits were limited to a free-recall test in the latter case). Whether instructions to use imagery will help students learn materials in a manner that will translate into improved course grades is unknown, and research investigating students’ performance on achievement tests has shown imagery use to be a relatively inert strategy ( Lesgold et al., 1975 ; Miccinati, 1982 ; but see Rose, Parks, Androes, & McMahon, 2000 , who supplemented imagery by having students act out narrative stories).

When imagery instructions do improve criterion performance, a question arises as to whether these effects are long lasting. Unfortunately, the question of whether the use of imagery protects against the forgetting of text content has not been widely investigated; in the majority of studies, criterion tests have been administered immediately or shortly after the target material was studied. In one exception, Kulhavy and Swenson (1975) found that imagery instructions benefited fifth and sixth graders’ accuracy in answering questions that tapped the gist of the texts, and this effect was even apparent 1 week after the texts were initially read. The degree to which these long-term benefits are robust and generalize across a variety of criterion tasks is an open question.

This pattern is also apparent from studies with sixth graders, who do show significant benefits of imagery use on measures involving the recall or summarization of text information (e.g., Kulhavy & Swenson, 1975 ), but show reduced or nonexistent benefits on comprehension tests and on criterion tests that require application of the knowledge ( Gagne & Memory, 1978 ; Miccinati, 1982 ). In general, imagery instructions tend not to enhance students’ understanding or application of the content of a text. One study demonstrated that training improved 8- and 9-year-olds’ performance on inference questions, but in this case, training was extensive (three sessions), which may not be practical in some settings.

The inconsistent benefits of imagery within groups of students can in part be explained by interactions between imagery (vs. reading) instructions and the criterion task. Consider first the results from studies involving college students. When the criterion test comprises free-recall or short-answer questions tapping information explicitly stated in the text, college students tend to benefit from instructions to image (e.g., Gyeselinck, Meneghetti, De Beni, & Pazzaglia, 2009 ; Hodes, 1992 ; Rasco et al., 1975 ; although, as discussed earlier, these effects may be smaller when students read the passages rather than listen to them; De Beni & Moè, 2003 ). By contrast, despite the fact that imagery presumably helps students develop an integrated visual model of a text, imagery instructions did not significantly help college students answer questions that required them to make inferences based on information in a text ( Giesen & Peeck, 1984 ) or comprehension questions about a passage on the human heart ( Hodes, 1992 ).

Fortunately, some investigators have manipulated the content of text materials when examining the benefits of imagery use. In De Beni and Moè (2003) , one text included descriptions that were easy to imagine, another included a spatial description of a pathway that was easy to imagine and verbalize, and another was abstract and presumably not easy to imagine. As compared with instructions to just rehearse the texts, instructions to use imagery benefited free recall of the easy-to-imagine texts and the spatial texts but did not benefit recall of the abstract texts. Moreover, the benefits were evident only when students listened to the text, not when they read it (as discussed under “Learning Conditions,” 6.2a, above). Thus, the benefits of imagery may be largely constrained to texts that directly support imaginal representations. Although the bulk of the research on imagery has used texts that were specifically chosen to support imagery, two studies have used the Metropolitan Achievement Test, which is a standardized test that taps comprehension. Both studies used extensive training in the use of imagery while reading, and both studies failed to find an effect of imagery training on test performance ( Lesgold, et al., 1975 ; Miccinati, 1982 ), even when participants were explicitly instructed to use their trained skills to complete the test ( Lesgold et al., 1975 ).

The actual use of imagery as a learning technique should also be considered when evaluating the imagery literature. In particular, even if students are instructed to use imagery, they may not necessarily use it. For instance, R. C. Anderson and Kulhavy (1972) had high school seniors read a lengthy text passage about a fictitious primitive tribe; some students were told to generate images while reading, whereas others were told to read carefully. Imagery instructions did not influence performance, but reported use of imagery was significantly correlated with performance (see also Denis, 1982 ). The problem here is that some students who were instructed to use imagery did not, whereas some uninstructed students spontaneously used it. Both circumstances would reduce the observed effect of imagery instructions, and students’ spontaneous use of imagery in control conditions may be partly responsible for the failure of imagery to benefit performance in some cases. Unfortunately, researchers have typically not measured imagery use, so evaluation of these possibilities must await further research.

Learning conditions play a potentially important role in moderating the benefits of imagery, so we briefly discuss two conditions here—namely, the modality of text presentation and learners’ actual use of imagery after receiving imagery instructions. Modality pertains to whether students are asked to use imagery as they read a text or as they listen to a narration of a text. L. R. Brooks (1967 , 1968 ) reported that participants’ visualization of a pathway through a matrix was disrupted when they had to read a description of it; by contrast, visualization was not disrupted when participants listened to the description. Thus, it is possible that the benefits of imagery are not fully actualized when students read text and would be most evident if they listened. Two observations are relevant to this possibility. First, the majority of imagery research has involved students reading texts; the fact that imagery benefits have sometimes been found indicates that reading does not entirely undermine imaginal processing. Second, in experiments in which participants either read or listened to a text, the results have been mixed. As expected, imagery has benefited performance more among students who have listened to texts than among students who have read them ( De Beni & Moè, 2003 ; Levin & Divine-Hawkins, 1974 ), but in one case, imagery benefited performance similarly for both modalities in a sample of fourth graders ( Maher & Sullivan, 1982 ).

Investigations of imagery use for learning text materials have focused on single sentences and longer text materials. Evidence concerning the impact of imagery on sentence learning largely comes from investigations of other mnemonic techniques (e.g., elaborative interrogation) in which imagery instructions have been included in a comparison condition. This research has typically demonstrated that groups who receive imagery instructions have better memory for sentences than do no-instruction control groups (e.g., R. C. Anderson & Hidde, 1971 ; Wood, Pressley, & Winne, 1990 ). In the remainder of this section, we focus on the degree to which imagery instructions improve learning for longer text materials.

6.2 How general are the effects of imagery use for text learning?

A variety of mechanisms may contribute to the benefits of imaging text material on later test performance. Developing images can enhance one’s mental organization or integration of information in the text, and idiosyncratic images of particular referents in the text could enhance learning as well (cf. distinctive processing; Hunt, 2006 ). Moreover, using one’s prior knowledge to generate a coherent representation of a narrative may enhance a student’s general understanding of the text; if so, the influence of imagery use may be robust across criterion tasks that tap memory and comprehension. Despite these possibilities and the dramatic effect of imagery demonstrated by Leutner et al. (2009) , our review of the literature suggests that the effects of using mental imagery to learn from text may be rather limited and not robust.

In one demonstration of the potential of imagery for enhancing text learning, Leutner, Leopold, and Sumfleth (2009) gave tenth graders 35 minutes to read a lengthy science text on the dipole character of water molecules. Students either were told to read the text for comprehension (control group) or were told to read the text and to mentally imagine the content of each paragraph using simple and clear mental images. Imagery instructions were also crossed with drawing: Some students were instructed to draw pictures that represented the content of each paragraph, and others did not draw. Soon after reading, the students took a multiple-choice test that included questions for which the correct answer was not directly available from the text but needed to be inferred from it. As shown in Figure 5 , the instructions to mentally imagine the content of each paragraph significantly boosted the comprehension-test performance of students in the mental-imagery group, in comparison to students in the control group (Cohen’s d = 0.72). This effect is impressive, especially given that (a) training was not required, (b) the text involved complex science content, and (c) the criterion test required learners to make inferences about the content. Finally, drawing did not improve comprehension, and it actually negated the benefits of imagery instructions. The potential for another activity to interfere with the potency of imagery is discussed further in the subsection on learning conditions (6.2a) below.

6.1 General description of imagery use and why it should work

6 Imagery use for text learning

On the basis of the literature reviewed above, we rate the keyword mnemonic as low utility. We cannot recommend that the keyword mnemonic be widely adopted. It does show promise for keyword-friendly materials, but it is not highly efficient (in terms of time needed for training and keyword generation), and it may not produce durable learning. Moreover, it is not clear that students will consistently benefit from the keyword mnemonic when they have to generate keywords; additional research is needed to more fully explore the effectiveness of keyword generation (at all age levels) and whether doing so is an efficient use of students’ time, as compared to other strategies. In one head-to-head comparison, cued recall of foreign-language vocabulary was either no different after using the keyword mnemonic (with experimenter-provided keywords) than after practice testing, or was lower on delayed criterion tests 1 week later ( Fritz, Morris, Acton, et al., 2007 ). Given that practice testing is easier to use and more broadly applicable (as reviewed below in the Practice Testing section), it seems superior to the keyword mnemonic.

5.5 The keyword mnemonic: Overall assessment

The majority of research on the keyword mnemonic has involved at least some (and occasionally extensive) training, largely aimed at helping students develop interactive images and use them to subsequently retrieve targets. Beyond training, implementation 