Sample and procedure

The NTR was initiated in 1987 in the Netherlands, and follows twins from childhood to adulthood (for more details see van Beijsterveldt et al. 2013). The present study includes measures of the ASEBA-CBCL/ASEBA-TRF based on mother-, father- and teacher-report of children assessed at aged 7, aged 10 and aged 12, and measures of the ASEBA-YSR based on self-reports in children aged 12, aged 14 and aged 16. Accordingly, we assessed the reliability and validity of the ASCS based on mother-, father- and teacher-report of children aged 7, aged 10 and aged 12, and measures of self-control based on self-reports in children aged 12, aged 14 and aged 16 (the scales are consistent across ASEBA measures). The current study includes data from 24,704 7-year-olds (50.3% girls), 19,589 9/10-year-olds (50.7% girls), 16,436 12-year-olds (50.9% girls), with 1704 self-reports for 12-year olds (50.8% girls), 10,020 14-year-olds (57.6% girls) and 7566 16-year-olds (59.9% girls). Participants with a disease or handicap that interfered severely with daily functioning were excluded (N = 500). For same sex twin pairs, zygosity was based on DNA polymorphisms (N = 1578) or blood markers (N = 240). For the remaining same-sex twin pairs, zygosity was determined using parent-reported items on resemblance in appearance and confusion of the twins. In approximately 93% of the cases, zygosity was correctly classified by these items (Rietveld et al. 2000). For the main analyses, we included all teacher reports, with slightly more than half of the twins sharing the same teacher (age 7, 54%; age 10, 53%; age 12, 57% of the twins were rated by the same teacher).

Measures

ASEBA

The ASEBA assessment consists of standardized questionnaires, which are completed by parents (CBCL), children themselves (YSR), and/or teachers (TRF). These questionnaires are used to rate a child’s competencies and problems in the past 6 months (for parent- and self-report), or in the past 2 months (for teacher-report). The response format of the items is a 3-point scale, with response options not true (coded 0), somewhat or sometimes true (coded 1), and very true or often true (coded 2). The CBCL and TRF consist of 113 items and the YSR of 112. Subsets of items are summed to create syndrome scales such as social problems, anxious depressed, and somatic complaints (Achenbach and Rescorla 2001).

ASCS

The ASCS is intended to measure self-control as defined by person’s ability to control his or her impulses, alter his or her emotions and thoughts, and to interrupt undesired behavioral tendencies and refrain from acting on them (Muraven and Baumeister 2000). To develop the ASCS, we followed a systematic scale development procedure for item selection. In this procedure, two subject matter experts independently assessed the relevance of each item of the ASEBA to the theoretical conceptualization of self-control (Muraven and Baumeister 2000). A third reviewer independently screened all ASEBA items selecting those corresponding to items used in earlier self-control studies. To resolve disagreement, in-depth discussion followed based the theoretical literature (Muraven and Baumeister 2000; de Ridder et al. 2012) and earlier studies including separate items to construct a self-control scale (e.g., in line with items selected by Cecil et al. 2012; Moffitt et al. 2011; Hay and Forrest 2006; Turner and Piquero 2002). As a result, 8 items were selected for the ASCS (see Table 1), with 4 items of the attention problem scale (item 4, 8, 41, 78), 4 items of the aggressive behaviour scale (item 86, 87, 95), and 1 item of the rule breaking behaviour scale (item 28).

Table 1 ASEBA items (and corresponding number in the ASEBA instruments) included in the ASCS Full size table

We calculated the scale score given three or fewer missing item responses (Achenbach and Rescorla 2001). In the case of one to three missing item responses, we used the person-based weighted score. Cases with more than 3 items missing were excluded (2%), not expecting to influence variables of interest considering their low prevalence. Conducting our analyses in the subsample of participants without any missing values yielded similar results. Originally, the ASEBA was set up so that higher sum scores reflect higher frequency of child problems. Extending this approach to the ASCS, higher sum scores on the ASCS correspond to lower overall levels of self-control. This is in line with earlier studies on self-control (Moffitt et al. 2011).

Well-being

Well-being was assessed using the Cantril ladder (Cantril 1965). Parents (age 7, 9/10, 12) and children (14, 16) rated well-being on a ten-step ‘ladder’, with the bottom ‘step’ of the ladder representing the worst possible life and the top ‘step’ indicating the best possible life. Teachers rated well-being of 7, 9/10, and 12-year old children on a 5-point scale, with response options ranging from always or almost always unhappy (coded 1), more often unhappy than happy (coded 2), equally often happy as unhappy (coded 3), more often happy than unhappy (coded 4), almost always happy (coded 5).

Conners’ Parenting Rating Scale/Teacher Rating Scale—Revised

This widely used instrument assesses the severity of behavior problems of children in the past month (Conners et al. 1998a, b). The short version consists of 27 items for parent-report and 28 items for teacher-report (reported for age 7, 9/10, 12). Items are rated on a 4-point Likert scale ranging from 0 = not true at all (never, rarely), 1 = a little bit true (so now and then), 2 = quite true (often, regularly), 3 = very much true (very often), where higher scores indicate more severe symptoms. Cronbach’s alphas were in line with the Conner’s manual, reporting Cronbach’s alphas between 0.83 and 0.85 for oppositional behavior, Cronbach’s alphas between 0.78 and 0.90 for inattention, and Cronbach’s alphas between 0.78 and 0.87 for hyperactivity (Conners et al. 1998a, b).

Educational achievement

Educational achievement was assessed through school results in math, language, learning problems, behaviour in class and education level in high school, evaluated separately. Parents rated children’s math and language achievement (on a 5-point scale, higher scores reflecting higher grades: 1 = fail, 2 = weak, 3 = pass, 4 = good, 5 = excellent), and learning problems (“did your child ever have learning problems?”, on a two-point scale, 1 = no, 2 = yes). Teachers rated compliance and task orientation of the child (“in comparison to the average student in your class, how compliant is he/she?”, “in comparison to the average student in your class, how task orientated is he/she”, 7-point scale, 1 = much less, 2 = less, 3 = a little bit less, 4 = average, 5 = little bit more, 6 = more, 7 = much more). Adolescents (aged 14, 16) rated their level of education. The Dutch school system divides education level according to three levels: VMBO (preparing students for vocational training), HAVO (preparing students to study at universities of applied sciences) and VWO (preparing students for university), also referred to as lower level (coded as 1), middle level (coded as 2) and higher level education (coded as 3), respectively.

Substance use

Adolescents (aged 14, 16) were asked how often they smoked (1 = never, 2 = I quit smoking, 3 = I smoke once a week, 4 = I smoke multiple times per week 5 = I smoke multiple times per day), their amount of alcohol intake per day in the weekend (1 = less than 1 glass, 2 = 1–2 glasses, 3 = 3–5 glasses, 4 = 6–10 glasses, 5 = 11–16 glasses, 6 = 17–20 glasses, 6 = 17–20 glasses, 7 = more than 20 glasses), and whether they had ever been drunk (0 = never, 1 = 1–2 times, 2 = 3–4 times, 3 = 5–6 times, 4 = 7–8 times, 5 = 9–10 times, 6 = 11–19 times, 7 = 20–39 times, 8 = more frequent).

Strategy of analyses

In order to examine psychometric properties of the ASCS, we tested internal consistency, dimensionality, criterion validity, inter-rater reliability, test–retest reliability, and heritability estimates. We used SPSS 21 (IBM Corp. 2012) and Mplus version 7 (Muthén and Muthén 2012) and conducted the analyses separately in children aged 7, 9/10, 12, 14, and 16, and separately for mother-, father-, self- and teacher-report. To correct for the dependency of the observations due to clustering in families, a sandwich estimator was used with weighted least squares with mean variance adjusted (WLSMV) as the estimator (Rebollo et al. 2012).

We investigated internal consistency by calculating Cronbach’s alphas. The dimensionality was examined by fitting a Multimethod-Single trait confirmatory factor model (CFA) (Campbell and Fiske 1959). This allowed us to establish whether the items measure a single factor (the single “trait” self-control) while taking into account the fact that the items are taken from different subscales within the ASEBA. In this manner, we can test the dimensionality of a model with one psychometric factor and multiple residual factors. Goodness of fit was evaluated using the Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI), and the Tucker Lewis Index (TLI). We adopted the rules of thumb that the RMSEA should be between 0.05 and 0.08 or lower (adequate fit in terms of error of approximation), and the TLI and CFI should be 0.95 or larger (Hu and Bentler 1999).

We examined criterion validity by calculating cross-sectional and longitudinal correlations between ASCS and the variables mentioned above concerning adaptive behaviors (i.e., well-being, educational achievement and substance use). Additionally, we investigated inter-rater reliability by examining the correlations between the ASCS parent-, self- and teacher-report. We investigated test–retest reliability by investigating correlations between ASCS scores over time.

Next, we estimated the heritability of self-control in a classical twin design in Mplus version’, 7 (Muthén and Muthén 2012). This design is built on the premise that differences in the resemblance between monozygotic twins (sharing approximately 100% of their DNA) and dizygotic twins (sharing 50% of their segregating genes on average) can be used to parse phenotypic trait variance into environmental and genetic components (Boomsma et al. 2002). As such, this model can be applied to estimate additive genetic (A, additive effects of alleles at multiple loci), non-additive or dominance genetic (D), common environment (C, the part of the variance that is shared by members of family), and non-shared environment (E, the part of the total variance that is unique to a certain individual) effects. We used raw-data genetic structural equation modelling with maximum likelihood estimation to perform univariate model fitting analyses to estimate the contributions of A, D or C, and E.

We first fitted a saturated model to estimate the twin correlations with their 95% confidence intervals. Based on these twin correlations an ACE or an ADE model with parameters allowed to differ between boys and girls was fitted to the data. Nested submodels were compared by hierarchic χ2 tests. The χ2 statistic was computed by subtracting—2LL (log-likelihood) for the full model from that for a reduced model (v2 = − 2LL1 − (− 2LL0)). Given that the reduced model is correct, this statistic is χ2 distributed with degrees of freedom (df) equal to the difference in the number of parameters estimated in the two models (Δdf = df1 − df0). In addition to the χ2 test statistic, Akaike’s Information Criterion (AIC = v2 − 2df) was computed to compare non-nested models. A lower AIC indicates a better the fit of the model to the observed data. Quantitative sex differences were tested by constraining the A, C/D, and E parameters to be equal across sex (Neale et al. 2006). Based on the twin correlations, we see little support for qualitative sex differences, which were therefore not modelled. When sex differences appeared to be significant, a scalar-sex limitation model was tested. In this model, a difference in total variance between boys and girls is allowed, but the relative contributions of genetic and environmental influences are equal across gender (Neale et al. 2006). In order to test the significance of A, C/D factors, we fitted models without the parameter with confidence intervals including zero.