Four pieces of music were selected, which were of different emotional valence (positive, negative) and arousal (low, high). The details of the music piece’s title, composer, and average root mean square (RMS) amplitude can be found in Table 1 . In the control condition, participants completed the tasks in silence. The selected music pieces have been validated by earlier research to promote a particular mood [ 36 ]. Based on these validations, we refer to the five conditions applied in the current study as calm (positive valance, low arousal), happy (positive valance, high arousal), sad (negative valance, low arousal), anxious (negative valance, high arousal), and silence (no music induction).

Creativity measures–Divergent creativity.

Divergent thinking tests are open-ended tests. They are applied in approximately 40% of all creativity studies [37], and can be considered the most widely used creativity test [38–39]. One of the most frequently used and well-validated divergent thinking tests is the Alternative Uses Task (AUT) [19]. In the AUT participants are asked to list as many different and creative uses for a common object (in the current study a ‘brick’) as possible. Participants were instructed to type their ideas into a space provided on a computer display. They could type in an idea, and by hitting the Enter key, they could submit this idea and immediately receive a new opportunity to type in another idea. Participants were advised that their responses could be given in Dutch, English, or German. Creative performance during the AUT is reflected in an Overall divergent thinking (ODT) score—calculated by summing up participants performance on five indices of divergent thought: Fluency, Creativity, Originality, Usefulness and the Cognitive Flexibility of the ideas listed. These indices are discussed in the subsection below.

Fluency is a measure of creative production and represents the total number of ideas generated. To assign a fluency score, the number of ideas a participant listed are counted. Only complete (i.e., no unfinished) ideas were included in the fluency score.

Creativity is defined as the generation of ideas that are original (i.e., new) and meant to be useful [16–18]. A trained rater assigned all ideas a score on creativity, ranging from not at all creative (= 1) to very much creative (= 5). A second trained rater assigned 30% of the listed ideas a creativity score. The inter-rater reliability of the ratings was calculated using a 2-way random intraclass correlation coefficient (ICC) analysis for consistency, and it was considered good (ICC Creativity = .849). Per participant, a creativity sum score was calculated by adding the scores of the ideas a participant generated. Using a sum score is based on the quantity breeds quality assumption, that is, the idea that the more ideas are generated, the more ‘high quality ideas’ will be found among them [40]. In line with this reasoning, research has confirmed that creativity often increases with the number of ideas generated [41–43]. Moreover, by measuring creativity on the AUT by a sum score instead of by a mean score, we stay closer to the appraisal of creativity in real life. For example, audiences and stakeholders are more interested in the number of highly creative ideas or products somebody generated, than in the mean creativity of an individual’s work.

Originality is one of the core characteristics of a creative idea and refers to its uncommonness or infrequency [16–18, 44]. Common ways to score originality are applying the uniqueness scoring (i.e., counting the number of infrequent responses—mentioned by e.g. < 5–10% of the participant pool) or using judges to evaluate the responses for originality. While the uniqueness scoring allows for an objective scoring, a number of serious objections have been raised, for example, the problem that statistical infrequency does not account for the size of the participant pool and for the appropriateness of responses [45]. Using judges to evaluate all responses for originality [46, 47] is more subjective and time consuming, but it is considered a valid method due to the good inter-rater reliabilities that have been found [48]. In the current study trained judges were used to rate the responses for originality. One trained rater assigned all ideas a score on originality, ranging from not at all original (= 1) to very much original (= 5). The second trained rater assigned 30% of the listed ideas an originality score. The inter-rater reliability of the ratings was calculated using a 2-way random intraclass correlation coefficient (ICC) analysis for consistency, and it was considered good (ICC Originality = .781). Per participant, an originality sum score was calculated by adding the scores of the ideas a participant generated.

Usefulness is one of the core characteristics of a creative idea and refers to its effectiveness and practicality [49]. The trained rater assigned all ideas a score on usefulness, ranging from not at all useful (= 1) to very much useful (= 5). The second trained rater assigned 30% of the listed ideas a usefulness score. The inter-rater reliability of the ratings was calculated using a 2-way random intraclass correlation coefficient (ICC) analysis for consistency, and it was considered good (ICC Usefulness = .883). Per participant, a usefulness sum score was calculated by adding the scores of the ideas a participant generated.

Cognitive Flexibility manifests itself in the use of different cognitive categories and perspectives [16, 25], and it can be measured by the number of distinct idea categories a participant uses. For example, when asked to list alternative uses for a brick, the answer “build a house, build a wall, build a garage” would lead to a cognitive flexibility score of one, as all ideas are assigned to the category ‘building something’, whereas “build a house, break a window, use it as a pen holder” would lead to a score of three, as the ideas are assigned to three different idea categories (‘building something’, ‘destroying something’, ‘utensil’). Two trained raters worked together on developing the list of idea categories for the brick. Thereafter, one of the trained raters assigned each idea to a category from the predefined list of idea categories. Raters were blind to conditions, both when (i) developing the list of categories and (ii) when assigning the ideas to the pre-defined categories. Finally, per participant the total number of distinct idea categories used was calculated.

The Overall divergent thinking score (ODT) was calculated by adding up the five divergent thinking indices described above.