Our quasi-experiment showed that PF instruction yields better performance, compared with the conventional AL approach, on the exam following the manipulation (Midterm 2). This trend persisted for low-performing students also on the final exam, but was virtually eliminated for better performing students.

As commonly observed in quasi-experiments, differences between conditions existed in the outset, as shown for Midterm 1 and the other items on the Final exam (due to sample characteristics, instructor, or a combination of factors). As we control for performance on these items, the benefit of PF on manipulated items needed to be much larger in order to surpass other differences. Thus, the results presented here are a conservative estimate of the true value in PF.

Notably, students had considerable exposure to study topics following study and before the test. All students received practice problems, and all had time to prepare for Midterm 2, with several weeks between manipulation and testing. Thus, the effect of PF instruction in this study was long-term and fairly robust, compared to previous PF studies involving immediate post-tests. In fact, the effect of PF on low-achieving students also carried forward to the final exam, as discussed further below.

While PF instruction proved fruitful in Midterm 2, the effect was much smaller by the final exam. There are several potential explanations for this. It may be that the condition-topic questions on the final exam were too few and/or did not capture ability as well as the Midterm 2 condition-topic questions. An alternative explanation is one of decay, that is, the PF effect simply faded over time. A third likely explanation has to do with exposure. All students had ample opportunities to practice these concepts in subsequent tutorials, and thus bring their knowledge to a similar level regardless of initial instruction. The most likely explanation is that, being a final exam, students study for it and strategically focus their attention on topics that they know less. Thus, differences that might have existed between conditions were all but eliminated by students’ independent learning. The fact that students who performed lower on Midterm 1 still showed effect for PF also on the Final exam may strengthen this explanation, as these students are evidently less successful learners in this course. Further research on temporal aspects of PF, as well as its interaction with student ability, is warranted.

Our results suggest that students who are more challenged in the course (as reflected in Midterm 1) benefited more from PF, on both Midterm 2 and the Final exam. One possible explanation is that learning from instruction is challenging and may be cognitively overloading. Conceptual learning requires understanding of key concepts and how they fit together. To avoid cognitive overload during class time, all students were assigned pre-reading in our study. It may be that pre-reading alone was not sufficient for low-performing students. Instead, the addition of hands-on experiences helped them acquire basic tools with which they could make meaning of subsequent instruction. Schwartz et al.27 describe the benefits of these expository activities to help students construct relevant (prior) knowledge that assist them in learning from future instruction. Similar results, in which lower-performing students benefit from exploring the solution space, were also found in Geometry problem-solving47 and Physics simulations.48 It is important to emphasize, though, that these results are contingent on using Midterm 1 as a proxy for student ability. Further research is warranted to evaluate the robustness of this pattern.

Given that students in both conditions received support for active learning, this aspect of PF is probably not to be attributed for its success. If so, why did PF students outperform the AL students? When explaining the effect of PF, Loibl et al.22 build on Kapur and Bielaczyc32 to identify the following relevant mechanisms: prior knowledge activation, awareness of knowledge gaps, and recognition of deep features. Active learning also seeks to activate many of these mechanisms. Frequent classroom-response (Clicker) questions encourage students to activate their prior knowledge, and provide feedback to support awareness of knowledge gaps. We attribute the difference in learning between conditions in our study to two factors: depth of search and availability of solution.

By depth of search we refer to students’ struggle to develop a model that can solve the problems they are given. The quick turnaround of classroom-response questions provides students with feedback, but does not provide them with enough time to fully understand the requirements of the problems (and the limitations of their own solutions). This may lead to a difference in the third mechanism identified by Loibl and colleagues, recognition of deep features. In addition, there is some evidence that self-generated feedback (as done in the PF condition during the first phase) may lead to better learning compared to only receiving external feedback.49,50

The second significant factor is the availability of the canonical solution in the AL condition. Having a solution may encourage students to accept it “at face value”, without dissecting its deep structure, and without fully grasping its deep features.32,37,51

Our research question focused on ecological validity. Is PF superior to AL in situations in which instructors create their own version, in relatively complex (and less intuitive) topics?

Our Biology PF activity design (Fig. 1) was based on the design guidelines from Kapur and Bielaczyc.32 For the “Exploration” Phase, we developed problems that required students to access and elicit prior knowledge, apply it in a novel context, seek out appropriate resources, and generate and explore solutions with minimal assistance from the teaching team. The problems were designed to be challenging but not frustrating. In the topic area of Transcription and Translation, for example, students were required to access prior knowledge (from assigned pre-reading or from what they learned in high-school Biology courses) about the steps involved in the processes, the various structural parts needed for the processes (requiring reasoning across orders of magnitude – atomic to systems), the roles of the structural parts, and the sequence in which the roles occurred (requiring reasoning across ontological levels—e.g., DNA is information, a unit of inheritance, and a physical entity52). During the Exploration phase of the PF activity, accessing prior knowledge alone was not sufficient. Students also had to use the accessed knowledge appropriately by asking relevant questions necessary for generating the solutions on their own. This problem-solving phase provided opportunities for students to explore the affordances and constraints of multiple solutions.20 During this phase of struggle, modest support was provided by the facilitating teaching team in the classroom to help activate metacognitive processes and prevent onset of the frustration phase. Support was given in the form of prompts to focus students’ attention without providing additional information, such as, “what information do you have in the problem?”; “what other kind of information would you need in order to continue the process?”; or “where might you find this information?.” This support helped students understand the challenges and make progress, yet they mostly failed to construct complete and correct solutions.

The Consolidation phase of the PF activity (our post-activity Formative Feedback and Walkthrough phases) built on the solutions produced by the students in Activity phase. The student solutions, containing their mistakes, misconceptions, or gaps in knowledge, were used as examples by the instructor to provide feedback and to walkthrough the problems, in a logical, guided, and expert-like manner. The consolidation phase provided the students opportunities for comparing and contrasting, organizing, and assembling the relevant student-generated solutions into appropriate solutions.20

The study described above has several limitations. First and foremost, our choice to focus on ecological validity limited our ability to conduct a randomized controlled trial. Instead, this was a quasi-experiment; both sections were taught by different instructors, and students self-selected into sections based on schedule and other factors. For example, unexpectedly, the PF group had a higher ratio of students from the Faculty of Science, who are traditionally more successful in this course. Consequently, they performed better on some measures of knowledge. We compensate for these variations statistically, by (i) controlling for performance on a prior exam, (ii) controlling for person variables of enrollment in the faculty of science, university year and gender, and (iii) adding other items from the studied tests to our measures. However, such variation may lead to indirect impact, such as class dynamics. The fact that the results of this quasi-experiment echo earlier results (cf. Kapur46), suggests that sample characteristics are not the main contributing factor.

Another limitation is the fact that only one instructor taught using PF. As the study topics are highly discipline-specific, instruction was designed solely by the instructor. However, this was done with advice from a learning scientist, who provided motivation, relevant literature, design principles, and general emphases for the activity. This support however, was not specific to the two activities, and feedback directly on the activities was not provided. As the paper argues for a “Do-it-Yourself” (DIY) approach for PF, future work will need to show how well this transfers to additional instructors with even less support.

Lastly, our analysis by student performance relies on Midterm 1. This is a single exam of limited scope, early in the term, and thus these results are more suggestive than conclusive. Further study into aptitude-treatment interaction are warranted.

Overall, results show that PF instruction helped all learners achieve greater gains on a subsequent exam, even after several weeks had passed and all students received additional exposure to these topics. Results were especially positive for low-performing students, for whom the benefits of PF instruction lead to improvement of nearly seven points on relevant topics in the final exam of the course. These results show the promise and potential of PF in first-year large-enrollment university courses when implemented by the course instructor, and also as compared with the use of alternative effective active learning techniques.