Two experiments were performed. As only a few features in experiments 1 and 2 were different, we will first describe experiment 1 in detail after which we will note differences for experiment 2. The most important of these differences was that experiment 1 contained a timing manipulation in addition to the congruency manipulation. Here, participants either viewed A for 1.5 s and AC for another 2.5 s, or they viewed AC for the full 4 s. This condition was included because we hypothesized that the former condition would enhance retrieval of B and integration with C. This manipulation did not yield significant effects, so it was dropped for experiment 2. Therefore, we will only describe analyses on the remaining factors (congruency, reactivation, curiosity (experiment 1), and metamemory (experiment 2)) in the main text. More information about the timing manipulation and associated results can be found in Supplementary text S1.

Experiment 1

Participants

In previous studies (for example see ref. 14,34), effect sizes of the manipulations used here tended to be large if manipulated within participant. A power analysis assuming large effect size and a statistical power of 0.95 (and standard settings of G-Power35) suggests a sample size of at least 23. However, since in such experiments some participants tend to be at floor or at ceiling, we aimed to test at least 30 participants in both experiments. In experiment 1, 31 Participants studying either Psychology or Family Studies at the VU Amsterdam were recruited at the end of their first or second study year. Recruitment was achieved through a university participant system, flyers, and social media. One participant was excluded due to technical issues (coding problem), and nine participants were excluded because they did not have enough stimuli (10 or more in either condition) left for analyses, leaving 21 participants that were included in the analyses (15 from Psychology, five from Family Studies, and one unknown). Of all included participants, age was known for 20 because one participant did not fill out their birth date. The remaining participants were between 18 and 25 years old (mean: 20.33, SD: 1.58), 5 were male, and all self-reported to not be color blind and speak Dutch fluently. For participation, participants provided written informed consent and received either study credit or monetary reward (minimally €10 and maximally €12, depending on time involvement). Ethical approval was obtained before start of the experiments from the ethical committee (VCWE) of the faculty of Behavioral and Movement Sciences of VU Amsterdam.

Stimuli

The AB-AC combinations consisted of 80 words (A), clip art type pictures (B) and descriptions (C). For each word, two pictures and two descriptions were created to be able to fully counterbalance congruency (see below). In total, the set thus contains twice as many pictures and descriptions (160) as words, but each participant only encountered 80 of these during encoding. These ABC-combinations were constructed from study material of the studies Psychology and Family Studies of VU Amsterdam, focusing specifically on material learned beyond the first study year. This approach was chosen so participants could build on prior knowledge related to their studies and would be more motivated to study the material.

The ABC-combinations (examples in Fig. 1 and Table 2) were constructed as follows. The words (A-items) were chosen from study material of the Psychology and Family Studies curriculum, second study year and up. All participants, irrespective of their field of study, received the same stimuli. Words were chosen to be not too long (3–37 letters) and never consist of more than two words, so that participants could quickly read them. The set consisted of mostly of Dutch words, but foreign words were chosen when no good Dutch alternative was available (e.g., the English term “Display rules” is used in Dutch pedagogical practices). This resulted in 10 English words, three Latin words, and two abbreviations out of 80.

Table 2 Example of an ABC-combination with a word (A), a picture (B) and a description (C) Full size table

B-items were clip-art pictures selected to have clear lines and colors and to not be aversive in any way. They were obtained through the internet and contained no text unless absolutely necessary for the description (e.g., in case of the Stroop task) and they were judged by the experimenters to be easily describable in a few words. Picture size was set to a maximum of 350 × 350 pixels, with either the height or the width minimally equal to this value to keep the natural dimensions of the pictures (e.g. a landscape-oriented picture would be 350 × 300 pixels whereas a portrait-oriented picture would be 300 × 350 pixels). All pictures were checked manually for clear resolution and size. C-items were descriptions, consisting of maximally five words (range 1–5) and maximally 37 letters (range 5–37). All these descriptions were in Dutch. The BC-associations were chosen such that they could be combined into two congruent and two incongruent associations (see Table 2). For example, for the word “Methylphenidate” a picture of a boy juggling a lot of tasks was coupled with the description “Medicine against ADHD”. Additionally, we added a picture of a sick smiley, coupled with the description “Can cause nausea”. We thus made sure that for the congruent associations, the description always related to something in the picture. The incongruent associations were constructed by combining the picture with the “wrong” description (e.g. boy with “Can cause nausea”, and smiley with “Medicine against ADHD”). We manually checked that each incongruent association did not contain clear possible congruent relations. This resulted in four different possible associations per word that were counterbalanced randomly over participants so individual pairings would not influence average performance. This meant that each participant only learned a random half of the stimulus set, which allowed us to use the other descriptions as lures during the item recognition task (see Recall).

Procedure

The experimental procedure consisted of three parts: encoding, math task, and memory tests (recall). Participants were tested individually on a computer in a room shielded from outside noise. The task was presented with Presentation 19.0 (Neurobehavioral Systems) and participants used the computer keyboard to give responses. After giving informed consent and reading task instructions from a sheet of paper, participants performed a practice session consisting of three practice items that were not included in the experiment. After the practice session, they were allowed to ask questions and were given the opportunity to take the practice session again.

Encoding

During encoding, an AB-AC inference paradigm was used (Fig. 1) consisting of three items: a word (A), clip-art pictures (B), and a description (C). Participants learned these associations in 10 blocks with 8 associations each, leading to a total of 80 associations. Encoding blocks contained three phases: AB-encoding, AB-recall, and AC-encoding (see Fig. 1). The order of associations within each phase was pre-randomized for each participant separately such that (1) the congruency factor was randomly assigned, (2) each block contained four congruent and four incongruent associations, and (3) order was pseudo-randomized such that no more than three of each factor (congruent or incongruent) followed each other. The stimulus order during AC-encoding was the same as AB-encoding, but AB-recall had a different pseudo-random order with the same boundary conditions. In order to understand what was expected of them, participants were cued with the block number before each block (“Block n”) and with a description of the phase that would follow before each phase (i.e., “Learning”, “Test” etc.). Here, they were also instructed which buttons they had to use if they needed to answer questions (buttons “1” and “2” for AB-recall and buttons “1”, “2”, and “3” for AC-encoding, see below). Each of these cues lasted two seconds. Background color was set to white throughout and the experiment was run on full screen.

First, during the AB-encoding phase, participants encoded eight AB-associations (word-picture), which were depicted in the middle of the screen, word above picture (see Fig. 1). Word color was set to black and word size to 30. Words and pictures were shown for three seconds each with an inter-trial interval (ITI) of one second. Participants were asked to passively look at the associations and try to remember them. After they learned eight AB-associations, participants were tested on each of these associations (AB-recall). They were shown each word again, in a different random order than during AB-learning. This time the word was shown with two pictures underneath, one of which was the correct associated picture and the other was one of the other, randomly selected, pictures they just learned in the preceding AB-encoding phase. The side (left or right) of the correct picture was randomly assigned as well. Participants were instructed to press either “1” for the left or “2” for the right picture on the computer keyboard. After they pressed, they immediately proceeded to the next picture or, if they did not press, the program would continue after three seconds maximum.

Then, participants proceeded to the AC-encoding phase. Here, they were shown each of the eight words again (in the same random order as during AB-encoding), but this time with an associated description, for four seconds (or 1.5 s A and 2.5 s AC, see Supplementary Text S1). Words were again shown in black above the descriptions, which were depicted in blue, both with the same text size. Participants were instructed to again study these associations and additionally try to reactivate the associated B-picture by generating a mental image. After stimulus presentation, participants were asked two questions. First, they were asked to indicate whether they (1) already knew the word, (2) thought the word was interesting, (3) thought the word was not interesting (curiosity). After they answered this question or, if they did not press, after four seconds, they moved on to the second question. Here, they were asked to indicate whether they managed to reactivate the B-picture: (1) strongly, (2) a bit, or (3) not. This trial was again presented for maximally four seconds. For both these questions, participants were instructed to answer with the buttons “1”, “2”, and “3” and answer as quickly as possible to avoid exceeding the answer period. After finishing this phase, they saw a block cue and moved on to the next block in which they learned 8 new associations. Encoding lasted approximately 25 min.

Math task

After encoding, participants performed a short math distraction task. Participants were asked to count backwards from a given number with a given step amount (e.g., from 89, count back in steps of 6). After 10 s, they were asked to type in the number they ended with and press “Enter”. Participants followed this procedure for 6 numbers, which lasted a few minutes, depending on how quickly participants answered. This task was solely included to disrupt possible ongoing working memory processes so logfiles were checked to see whether participants performed the task but data was not analyzed.

Memory tests (recall)

Following the math task, participants were tested on the learned C-items, first through an item recognition test, then a cued recall test, and finally an associative recognition test for each of the ABC-combinations that they learned during encoding (see recall panel in Fig. 1). Order was pseudo-randomized for each participant such that no more than three items of each condition (congruent and incongruent) followed each other. Lures consisted of all the descriptions that were not learned during encoding (80) and were interspersed pseudo-randomly using the same constraint. Before starting the memory test, participants received a practice trial containing five items of which three were previously seen during the encoding practice round. If they wished, they could do this practice task a second time.

First, participants performed an item recognition task for the C items. They were shown a description in the middle of the screen and were asked to indicate whether they learned this description during encoding. They did this by using a 6-point likert scale ranging from “1” (“very sure I learned this”) to ”6” (“very sure I did not learn this”). If participants indicated they learned an encoded stimulus (i.e., by using buttons 1–3), they continued to the next two memory tasks. Else, they were shown the next description. This trial lasted for four seconds regardless of when participants pressed a button. After answering, participants were shown their answer (with a red marking) on the screen and the screen froze for the remainder of the trial before moving to the next trial.

The second memory task (referred to as associative recall) was a cued recall test of the indirect BC-association where the description (part C) was used as a cue and participants were instructed to answer by typing in a description of the indirectly associated picture (B). Answer time was self-paced, and participants proceeded to the next test by pressing “Enter”. If they did not know the associated picture, they were instructed to press “Enter” immediately. Then, participants received a cue to put their fingers back on the buttons 1–6 and press “1” to continue. This was included to make sure participants could answer quickly in the next test.

That third memory task was an associative recognition test of the BC-association where the description (C) was used as a cue, and participants could choose from two pictures, one of which was the correct answer (B) and the other was randomly chosen from the full set of learned pictures while making sure that this lure picture only appeared as a lure once. The pictures appeared in random order next to each other underneath the description. To answer, participants used the same scale as during item recognition, where buttons 1–3 corresponded to the left picture and 4–6 to the right picture. This trial lasted for three seconds regardless of when participant pressed a button. After answering, participants were shown their answer (with a red marking) on the screen and the screen froze for the remainder of the trial before moving to the next trial.

Time spent on the memory tests depended on performance, specifically because the cued recall task involved typing in the answer in a self-paced manner (see below). Nevertheless, total participation time never exceeded 1.5 h. After finishing the recall task, participants filled out a study-specific questionnaire describing their experiences with the experiment, and a payment form in case they wanted a monetary compensation. They also received a debriefing that stated the goal of the experiment and provided the opportunity to learn about future outcomes of the experiment, and they were invited to ask questions about the nature of the experiment.

Analyses and code availability

Analyses were performed using custom Matlab 2015b (The Mathworks) scripts, IBM SPSS Statistics 23, GPower 3.1,35 and graphs were created using Jupyter Notebook (http://jupyter.org/) together with the Seaborn package (https://seaborn.pydata.org). All scripts can be found on https://github.com/marliekevk/Integrating-Educational-Knowledge. Stimuli of which the word was indicated to be known (to ensure new learning was going on), or were not responded to in time during AC-encoding were excluded from analyses. Additionally, trials that were not responded to in time during recall were omitted as well. As indicated above (in the Participants section), only participants that still had 10 or more trials left per condition (item recognition hits) after these omissions were included in the analyses (21 participants). For the curiosity and reactivation analyses, the trials that did not include a curiosity or reactivation score were also omitted.

To investigate congruency effects, we used paired-sample two-tailed T-tests on performance scores for all three memory tests (d-prime for item recognition, and percentage correct for associative recall and recognition), and one-sample two-tailed T-tests to detect differences from chance performance. D-prime was calculated by Z-scoring the hits and false alarms scores and calculating the difference between these measures (Z(hits)—Z(FA)).36 Confidence level was not further considered because of low trial numbers. In cases where there was a 0% false alarm rate in a certain condition, false alarms were recalculated using the MacMillan method37 (false alarm rate = 0.5/n where n is the maximum number of trials per condition). Correctness of the answers in the associative recall test (where participants were instructed to freely type in their answer) was assessed by hand by two independent raters that rated each description with either 0 (incorrect), 0.5 (partly correct) or 1 (correct). After independent rating, the two raters reached consensus on the items where they disagreed to come to a final rating per item. Performance in the final associative recognition test was calculated as the proportion of hits, independent of confidence.

To investigate effects of curiosity and reactivation, we calculated average memory scores for each bin (2 scores for curiosity and 3 scores for reactivation, see Encoding for details on these ratings). We subsequently assessed effects of within-subject factors congruency and curiosity (2 × 2) or reactivation (2 × 3) using a repeated-measures ANOVA. Note that because trials were not equally divided over curiosity, metamemory (see below Exp. 2), and reactivation factors, statistics for these factors were not necessarily equal in the different ANOVAs. For that reason, we report first paired-sample T-tests for main effects as well (see above), which would normally be redundant. Alpha was set at .05 throughout.

Experiment 2

For experiment 2 we made a few adjustments to the experimental design. Only the differences are detailed below, the rest of the procedure and design was the same as in experiment 1.

Participants

Thirty participants were recruited at the end of their first study year (one year after experiment 1). One participant was excluded due to technical issues (no encoding logfiles were stored) and six participants were excluded because they did not have enough stimuli (ten or more in either condition) left for analyses, leaving 23 participants that were included in the analyses (16 from Psychology, five from family studies, and two unknown; seven male; mean age 20.32 SD 1.56, range 18–24).

Stimuli

We omitted the 16 associations that participants in experiment 1 indicated were most familiar from the set, leaving 64 associations divided over 8 blocks of 8 associations. These consisted, next to Dutch words, of seven English and two Latin words.

Procedure

The timing condition was omitted because we did not find effects in experiment 1 (see SI), leaving only the congruency condition. Because of a null-effect for curiosity ratings as well, we now asked participants to indicate metamemory judgments (“how well do you think you will remember the word”) after AC-learning instead of curiosity ratings. More specifically, participants were asked whether they (1) already “knew”, (2) thought they were “going to remember” or (3) thought they were “not going to remember” the word. The rest of the procedure was the same, and for randomization only the congruency factor was taken into account. Because of the decrease in number of associations, this experiment was a bit shorter than experiment 1: approximately 20 min for encoding and 1 h and 20 min in total.

Data availability

Data are openly available on Harvard Dataverse: https://doi.org/10.7910/DVN/UQV9H4. Supplementary information (SI) is available online, and all original stimulus combinations are available upon request because of copyright issues.