Participants

We analysed data of N = 2169 subjects from two different samples (for an overview of sample characteristics see Table 1). Overall, at the time point of the study 56% of the subjects used hormonal contraceptives (HC-yes) and the rest were naturally cycling women (HC-no). Group allocation to HC-yes or HC-no was based on self-reported information. For the HC-no group information about cycle phase was missing in both samples. Information about the type of hormonal contraception was only available for sample 1. In this sample, out of the 520 women using HC, 430 used OC (83%) and 89 used other methods like ring (n = 60, 11%), spiral (n = 14, 3%), patch (n = 9, 2%), shot (n = 3, 0.6%), or rod tabfig(n = 3, 0.6%). Information on the specific compounds used was missing in both samples. Across both samples, the mean age was 22.77 years (range 18–35 years). The HC-groups differed significantly in age (t (2167) = 3.09, p = 0.002) and two scales of the NEO-Five Factor Inventory [27] (NEO-Conscientiousness: t (1648) = −5.29, p = 1.4 × 10−07; NEO-Openness to Experience: t (1648) = 5.3, p = 1.3 × 10−07). Meaning that HC-users were younger, more conscientious, and less open to experience. NEO-FFI data were available for a subset of subjects (n = 1650) of the overall sample. Subjects were recruited from the area of Basel in Switzerland. Sampling strategy was to recruit large samples of healthy young adults, without further restrictions. Advertising was done mainly at the University of Basel and in local newspapers. Subjects were free of any neurological or psychiatric illness, and did not take any medication (apart from oral contraceptives) at the time of the experiment. The ethics committee of the Canton Basel approved the experiments. Written informed consent was obtained from all subjects before participation.

Table 1 Sample characteristics Full size table

The subjects included in this study represent subsets of two ongoing studies, which were previously described [12, 28]. The purpose of both studies is to identify biological correlates of cognitive performance by using genetics, electroencephalography (EEG), and imaging techniques in healthy young adults from the general population; HC-status was not a primary outcome variable.

Behavioural tasks descriptions

Subjects in both samples performed on identical versions of two behavioural tasks of interest, namely a picture rating task and a memory task. We only included women with complete data for both tasks and known HC-status (see Table 1). The picture rating task consisted of the presentation of N = 24 pictures per valence category (negative, neutral, and positive). On the basis of normative valence scores pictures from the International Affective Picture System (IAPS) [29] were assigned to emotionally negative, neutral, and positive picture valence category (negative: 1.4–3.5, neutral: 4.4–5.6, positive: 7.1–8.3). Eight neutral pictures were selected from in-house standardised pictures sets to equate the picture set for visual complexity and content (e.g., human presence). Subjects rated the presented pictures according to valence (negative, neutral, positive) and arousal (low, middle, high) on a three-point scale. In addition to the emotional pictures, 24 scrambled pictures were presented to the subjects in the picture rating task. The background of the scramble pictures contained the colour information of all pictures used in the experiment (except primacy and recency pictures), overlaid with a crystal and distortion filter (Adobe Photoshop CS3). In the foreground, a mostly transparent geometrical object (rectangle or ellipse of different sizes and orientations) was shown. The object had to be rated by the subjects regarding its form (vertical, symmetric, horizontal) and size (small, medium, large).

An unannounced free recall picture memory task was the second task of interest. Here subjects had to freely recall the pictures presented during the picture rating task after a 10 min delay. Subjects were instructed to describe the pictures with short keywords, to note as much as they can remember related to the pictures and to describe as many of the pictures as possible. In order to account for primacy and recency effects in memory, two additional pictures showing neutral objects were presented in the beginning and two at the end of the picture rating task. They were not included in the analysis. Two independent and blinded raters scored picture descriptions to identify the number of correctly recalled pictures (Cronbachs alpha 91–98%). A third independent rater then decided for the pictures rated inconsistently.

Study description

The experiment of sample 1 took place on one visit in combination with magnetic resonance imaging (MRI) data acquisition. Every subject was tested individually. Testing of subjects from sample 2 took place on three visits in groups of 1–7 subjects in a behavioural laboratory in combination with electroencephalography (EEG) measurements. The time interval between visit 1 and 2 was on average 15 days, whereas visit 2 and 3 took place on two consecutive days.

In the following we describe only those parts of the experimental procedure that were relevant for our analyses.

On the first visit, participants received general information about the study and gave their written informed consent. On visit 1 (sample 1) or visit 2 (sample 2) participants were first instructed and then trained on the picture rating task and a working memory task (letter N-back [30]). The working memory task served as a distractor between picture rating task and the memory task (see Heck et al. [28] for detailed working memory task description). After training, participants performed on the picture rating task for ~20 min and the distraction task for ~10 min. For sample 1 both tasks were done in the MR-scanner, participants left the MR-scanner after completing the distraction task. Next, participants performed the unannounced free recall picture memory task (no time limit) outside the MR-scanner. Subjects of sample 2 performed all tasks in a behavioural laboratory. The total length of the experimental procedure at visit 1 in sample 1 was ~3–4.5 h and at visit 2 in sample 2 was ~3 h. Participants received 25 CHF/h for participation.

Statistical analyses

To account for differences across samples we z-transformed all task performances (valence and arousal ratings, memory performance) for each sample separately and then data of sample 1 and sample 2 were pooled together.

Ratings (valence and arousal) and memory performance were analysed by calculating three main (mixed) models with subject as random effect, and HC-status (HC-yes/HC-no; between-factor), valence category (negative, positive and neutral; within-factor), and the interaction term between HC-status and valence category as contrasts of interest (fixed effects). The models were estimated by restricted maximum-likelihood estimation (REML). Age was included as covariate in all models to account for the small, but significant group differences in age between the two HC-status groups (Table 1). Statistical tests for significance were done with F-tests. In case of significant interaction between HC-status and valence category, post hoc tests for each picture valence category were conducted separately by means of linear models (t-test), with HC-status as the variable of interest. For group comparisons (HC-yes vs. HC-no) we estimated Cohen’s d as effect size measurement. The estimate of d was based on the t-value of the linear models, but not on the mean and standard deviation of the respective task performance. Therefore, d is corrected for the effects of all confounding variables included in the linear model. By convention, d = 0.2 is considered to be a small, d = 0.5 to be an intermediate and d = 0.8 to be a large effect [31]. Due to the factor coding in our analyses, a positive d indicates that the HC-yes group scored higher on a given phenotype compared to the HC-no group. For the mixed models effects, which include a repeated measurement, we report the generalised η2 [32]. An η2 = 2% is considered to be small, η2 = 15% is considered to be intermediate, and η2 = 35% to be a large effect [31].

Further, we performed some additional analyses. In order to test for differences between samples, we recalculated the three main (mixed) models per sample and for the two samples together using the raw values including the variable sample (sample 1, sample 2). Additionally, to account for differences in personality traits, we first checked for differences between HC-status groups in the values of the five NEO-FFI scales (a Bonferroni corrected p-value of <0.01 was considered as significant given the five scales of the NEO-FFI) and then included the scales, which survived the Bonferroni correction (NEO-C and NEO-O) in the three main (mixed) models.

All calculations were done in R [33], the mixed model calculations were done with the nlme package [34], calculations of the generalised η2 were done with the ezANOVA package [35], and the mediation calculations were done with the MBESS package [36]. All models were calculated with full datasets per subject, which results in an orthogonal design regarding factors with repeated measurements. For the mixed models, all reported p-values are nominal p-values. For the mediation analysis, the p-value of the indirect effect was based on a bootstrapping procedure [36]. Due to the three phenotypes of interest (valence and arousal ratings, memory performance), the significance threshold was set to p-value < 0.017 (Bonferroni correction for three independent tests). p values < 2.2 × 10–16 were not expressed with exact values.