PF and similar learning approaches have repeatedly demonstrated their effectiveness for students’ conceptual knowledge acquisition in samples of adolescents and young adults. Previous studies in children around the elementary school age, which did not employ PF, but implemented designs sharing the characteristic element of delayed instruction with PF,14,15,16,17 yielded mixed results regarding the effectiveness of delayed timing of instruction on students’ conceptual knowledge. We identified two core design differences between the two types of studies, which might explain why younger students, in contrast to older students, did not consistently benefit from problem solving prior to instruction. The first difference refers to the absence of a comparing and contrasting activity during instruction and the second difference relates to students’ collaboration during problem solving. Both design components were typically implemented in PF studies with older students, and are thought to trigger the aforementioned learning mechanisms underlying the effectiveness of PF. In the present study, young students around the elementary school age learned with a typical PF design. Our learners collaboratively engaged in problem solving prior to receiving instruction, and this instruction included an instructor-led comparing and contrasting activity. Despite having adapted the implementation to align with previous PF studies, we were unable to replicate the beneficial effect of delaying instruction until after problem solving with young students. Thus, we had to reject Hypothesis 1 (PF > DI for conceptual knowledge). Furthermore, we found no beneficial effects of collaborating in small groups (vs. working alone) in PF. Students in the collaborative condition neither acquired more conceptual knowledge (Hypothesis 2) nor generated more solution ideas during the initial problem-solving phase (Hypothesis 3) than students in our individual PF condition.

Regarding the three main hypotheses, our study revealed null results, even though its statistical power was adequate to detect at least medium-sized effects. The challenge of null results lies in their interpretation. On the other hand, the null results could originate from flaws in our study design. First, the design of the PF learning material could be criticized for being too easy or too hard. Because the design of the material was strongly driven by and iteratively improved according to PF design recommendations6 and our students had sufficient prerequisite prior knowledge to be able to interact with the learning material (see pretest score in Result section), we believe, however, that the learning materials, and our implementation of PF in general, cannot explain the null effects. Another point of criticism in our design could refer to the conceptual knowledge posttest. While the test’s validity can be considered to be high as it was closely designed in consultation with subject matter experts, its internal consistency was low. The low internal consistency could relate to the heterogeneous nature of the posttest (i.e., assessing different target concepts) and the way it interacted with the setting of our study. In a real classroom setting, the allotted time for the posttest and thus the number of posttest items we were able to pose to the children, was rather limited. This presumably contributed to the relatively low internal consistency of the test. More test items (for each target concept) could have increased the internal consistency of the test. However, as we found no indication of statistical differences between experimental conditions, we doubt that differences could have been found even with a test of higher internal consistency. On the other hand, the null results might indicate that there really is no advantage of PF over DI for this young sample. Support for this interpretation lies in the small effect sizes and highly overlapping CIs between problem solving prior to, as compared with after instruction (i.e., PF vs. DI). These patterns were also found in a similar study by Chase and Klahr.25 Their study and our study, thus, add to the number of problem solving prior to instruction studies in students around the elementary school age, which showed no clear advantage of problem solving prior to instruction over problem solving after instruction.15,16,17,25 Despite the aforementioned limitations, we still believe our study makes a relevant contribution to the PF literature and leads to the question whether younger children might lack crucial prerequisites to productively learn from PF and similar approaches. But until having conducted PF studies that systematically compare different age groups, we cannot know whether students’ age, or another factor, explains the null results in our and other studies. This poses an experimental dilemma. While, for example, the most “popular” PF problem (involving standard deviation), is appropriate for adolescents and young adults, it is far too hard for young students around the elementary school age, because they do not have the prerequisite prior domain-specific knowledge to meaningfully engage with the problem. Problems that are appropriate for elementary school students, in turn, do not provide failure opportunities for older students, as they have already learnt the canonical solution for these problems in school. In other words, the PF effect cannot materialize across both age groups when using the exact same learning material. Nevertheless, with the aim of contributing to future research hypotheses, we would like to open the discussion about potential characteristics of young learners that might limit their ability to benefit from PF.

Potential cognitive limitations of younger students: first of all, young students might not have sufficiently developed the necessary cognitive capacities for benefiting from PF. During unsupported problem solving, the cognitive demand tends to be high as previous PF research4 demonstrated. For example, learners need to break down the problem, monitor problem-solving steps, and keep track of goals and sub-goals. This requires working-memory capacity, particularly executive functions. However, the executive functions of children around the elementary school age, and thus their ability to regulate their actions and monitor problem-solving steps, are not as developed as the ones of older students.18 Therefore, the beneficial effect of delaying instruction after problem solving might not transfer to younger students due to their comparably lower general cognitive capacities (as discussed by Loehr et al.16). Future research is needed to better understand how young students’ cognitive capacities, and learning outcome relate to one another in PF and similar settings.

Moreover, in contrast to older students, younger students are likely to have less advanced self-regulatory learning strategies26 such as dealing with failure and controlling for motivation, attention, and emotions.27 These strategies, however, might be an important prerequisite for persistently engaging in solving a problem that is designed to be challenging (and can thus be frustrating) and to generate multiple (but not necessarily complete) solution ideas. To generate multiple solution ideas is important because it is discussed to relate to students’ prior knowledge activation and differentiation3,6,13 (cf. learning mechanism 1). In our study, however, the PF students generated a rather low number of solution ideas. On average they generated only 4.36 (SD = 1.85) solution ideas out of the range of possible solution ideas (e.g., calculating with fractions, using concrete or abstract graphical representations, trying out equivalents of fractions, or making use of more creative approaches, see Methods section). This result supports the notion of young students’ low(er) persistence in regulating their own problem solving. To shed more light on the potential role of young students’ self-regulation strategies for the effectiveness of PF, again future research is needed.

The role of collaboration: in our sample of children around the elementary school age, we found no benefits of working collaboratively vs. individually during PF. Our additional process analyses, however, revealed two main findings that might inform the design of PF interventions that are productive for young students. First, the more dyad members engaged in constructive utterances (e.g., by posing questions, generating analogies, or drawing inferences to go beyond the information presented in the original problem24), the more solution ideas the dyad generated. These solution ideas, in turn, represent important learning opportunities for the students. To facilitate dialogs with a high number of constructive utterances, a potential implication for future PF studies could thus be to prompt students’ collaboration. In addition to provide students with the PF typical motivational prompts during problem solving (as in our study), we would thus suggest to design prompts that specifically target students’ constructive contributions.

Second, students who showed high relative levels of constructive and interactive utterances in the discussion with their partner scored higher in the conceptual knowledge posttest. This result is in line with the ICAP framework and with broader research on collaborative learning: both constructive and interactive utterances require creating, sense-making, and elaboration processes, which are known to facilitate students’ conceptual knowledge acquisition.19,20,21 A similar pattern was found for the relative levels of active utterances, which according to the ICAP taxonomy trigger a lower but still beneficial level of students’ cognitive engagement (i.e., attention processes rather than creating processes). Thus, higher individual conceptual knowledge is associated with higher quality of individual talk during problem solving (with causality probably present in both directions). Regarding possible support for younger students in PF, this mainly points towards one precondition, namely students’ ability to engage in high-quality dialog (i.e., with high levels of at least active, or even constructive and interactive utterances). In line with German mathematical standards, our young students should have met this precondition and should have been able to communicate and to engage in mathematical reasoning.28 However, younger students might still need additional practice with engaging in learning-centered dialog with peers before they can benefit from collaboratively tackling problems in a PF setting, especially as it takes time (and practice) for the potential of collaborative learning to unfold.29 Thus, future PF research should continue to investigate the role of the students' experience in small group collaboration in the effectiveness of PF. One option would be to observe students and their collaboration in longitudinal research designs.

Analyzing collaborative engagement in PF: to analyze students’ collaborative processes in the PF-Coll condition, we mainly drew upon the ICAP framework. This framework relates different kinds of overt learning processes to underlying cognitive processes, resulting in a proposed hierarchy of learning processes (I > C > A > P). Interactive processes are said to be more beneficial for students’ learning than constructive processes, as they trigger joint creating processes by interactively building on a learning partner’s contribution rather than creating processes by constructively giving a (self-) explanation. Both processes are said to be more beneficial than active (e.g., reading parts of the problem, triggering attending processes) and passive processes (e.g., engaging physically with learning material). The ICAP framework, thus, covers a variety of different learning processes that can be expected to occur during students’ collaborative problem solving (in PF). The mode of engagement that the ICAP framework terms interactive synthesizes beneficial collaborative processes that have been described by a large body of previous research. Such process features include students’ joint focus of attention,30 flow of collaboration,31 or transactive interactions.32 Following the ICAP framework’s prediction, interactive processes should show the strongest relationship with learning. However, we found no clear superiority of interactive processes in our study. Regarding the generation of solution ideas, the number of constructive utterances was not only higher (constructive: M = 14.25, SD = 6.53 > interactive M = 7.50, SD = 5.88) but also more relevant than the number of interactive utterances. During the PF phase, the cumulative contributions of individual students, rather than the co-constructed ideas, determined how many solution ideas a dyad noted down. Regarding the individual posttest score, the relative numbers of interactive, constructive, and active utterances all showed a positive relationship. Against this background, it seems that the most relevant factor for learning is content-related talk (and not the social mode in which it was generated). This major effect might have overshadowed the differential effects of the different types of utterances (i.e., constructive and interactive utterances). The missing evolvement of the ICAP hierarchy (I > C > A > P) might relate to two aspects: first, the categories of the ICAP framework (which was primarily meant to guide the design of learning environments and not the analysis of dialog patterns in a specific learning environment) may be too broad to capture different nuances of collaborative learning such as giving explanations, posing questions, justifying claims and so on. Second, the overt learning processes might not always reveal the underlying cognitive learning processes, especially when differentiating between constructive and interactive processes. Even though the theoretical and coded differentiation between interactive and constructive activities seems clear, the point when a constructive contributions becomes an interactive contribution might not always be visible.33 Does no explicit reference to a previous contribution automatically mean that students did not interactively build on this previous contribution and did not integrate the previous idea? This unclear relation between the overt learning activity and the cognitive learning processes may explain the missing superiority of interactive utterances in our study. To shed more light on this relation in particular and on the role of collaboration in PF, more in-depths analyses are needed.

In summary, our study failed to demonstrate the often beneficial effects of PF in a sample of younger students but opened the stage for discussing potential boundary conditions of PF relating to students’ age. However, these boundary conditions need to be investigated and validated in future studies.