Participants

Primary school teachers provided information concerning perpetration and victimization in 8215 twins from 4561 pairs. Of these pairs, 1669 were MZ (monozygotic) and 2892 were DZ (dizygotic; 53% of the DZ twin pairs were of same sex and 47% were of opposite sex). The twins were enrolled in the Netherlands Twin Register (NTR; Van Beijsterveldt et al. 2013), which was established by the Department of Biological Psychology at the Vrije Universiteit Amsterdam. The project was approved by the medical ethical committee of the Vrije Universiteit Amsterdam (NTR/25-05-2007). Parents of the twins, aged 7, 9, and 12 years, provided their consent to approach the teachers of the twins with a survey. Since 2010, the survey for the primary school teachers has included four items on perpetration and four items on victimization. The current study is a follow-up study of Veldkamp et al. (2017), that focused on the prevalence of perpetration and victimization, and included the same data, which were collected between 2010 and 2015. Data were excluded if (1) zygosity was unknown (N = 193), (2) the teacher was not sufficiently familiar with the child (N = 74), (3) the child was rated by someone other than the regular teacher (N = 81), (4) the twins were in separate classrooms, but rated by the same teacher (N = 11), (5) the twins were in the same classroom but rated by different teachers (N = 108). The 8215 twin children in the final dataset had data for at least one wave.

The data are characterized by a small degree of dependency. A subset of children had data on two (N = 1617) or three (N = 93) time points, resulting in a total sample of 10,018 observations. We conducted the analyses with the complete data recognizing that the dependency may bias-down the standard errors. After removing the dependent cases and rerunning the analyses differences in the results were trivial. We also reran the analyses with the Mplus complex option, which corrects standard errors for the dependency. Again the differences in standard errors were trivial. Given lack of appreciable differences we proceeded with the original results.

Of the MZ twins, 45.1% attended separate classrooms and 54.9% the same. In the DZ twins, these figures were 49.5% and 50.5%, respectively. Incomplete data (N = 1384 twin pairs) was mostly due to one of the teachers not returning the survey when the twins attended separate classrooms (N = 1232). The age of the children ranged from 6.52 to 12.94 years (M = 9.48, SD = 2.01). The degree of perpetration and victimization did hardly change with age, as indicated by correlations between age and the eight phenotypes, which ranged from − 0.12 to 0.07. Hence, age effects were not further investigated.

Measures

Perpetration and victimization

Teachers received a survey which included four items about perpetration and four matched items about victimization. Each item concerns general, physical, verbal, or relational perpetration and victimization. The 2 × 4 questions were scored on a five-point response scale, ranging from 0 (never), 1 (once or twice), 2 (two or three times a month), 3 (about once a week), to 4 (several times a week). The four items for bullying victimization were: ‘How often has this student in the last couple of months… a) been bullied (in general), b) been teased, laughed at, or called names? (verbal), c) been physically bullied, such as being hit, kicked, and pushed? (physical), d) been excluded by other children, ignored, or have other students spread false rumors? (relational)’. The parts between brackets (e.g., “relational”) were indeed part of the teacher items. For the original Dutch items, see the Supplementary Materials Online. Bullying perpetration was assessed with the same items, but formulated to reflect the active form (e.g. ‘How often did this student in the last couple of months… a) bully other students (in general)’). Missingness at the level of the individual items was less than 1.6%.

In the case of general, verbal, and relational perpetration and victimization items, the last two response options (i.e. “about once a week” and “several times a week”) were rarely chosen. Similarly, the last three response options of the physical perpetration and victimization items were rarely chosen. We therefore transformed the response scale of the general, verbal and relational items to three categories, and the response scale of the physical items to two categories.

Statistical analyses

First, we present the prevalence of being involved in the various types of perpetration and victimization, and the phenotypic correlations. Next, we present the results of the analyses of the twin data using genetic structural equation modeling. These results include the decomposition of the phenotypic bivariate covariance matrix (perpetration–victimization) into genetic and environmental components.

Behavioral genetic analyses plan

In twin studies, we use the ACE model to decompose phenotypic variances and covariances into genetic, common and unique environmental variance components. The A (in ACE) represents additive genetic influences, the C represents environmental influences that are shared by siblings (i.e., common) and lead to similarities between them, and E represents unique environmental influences, which make siblings less alike, and measurement error. The decomposition is based on the fact that MZ twins are genetically identical, while DZ twins on average share 50% of the alleles that make up segregating genes. Consequently, if the MZ twin correlation is larger than the DZ twin correlation, this suggests genetic influences. If twice the DZ correlation is greater than the MZ correlation, this suggests shared environmental influences (C). In contrast, if twice the DZ correlation is smaller than the MZ correlations, this suggests dominance influences (D). MZ twin correlations are invariably less than one, which imply the presence of unshared or unique environmental influences and measurement error (together E), which contribute to twin differences. In a twin study, C and D cannot be estimated simultaneously, so based on the twin correlations, either an ACE or ADE model is fitted. In practice the decomposition is carried out by fitting the ACE model (or ADE model) to the twin data using genetic structural equation modeling (Posthuma et al. 2003). This allows us to generalize the decomposition to multiple phenotypes. In the present case, we decompose the phenotypic 2 × 2 covariance matrix (perpetration–victimization by type of bullying) into 2 × 2 A, C and E covariance matrices. This provides information on the contributions of genetic and environmental factors to the variance of the phenotypes and to the covariance between the phenotypes. We used the bivariate Cholesky model to obtain the bivariate decomposition. This is depicted in Fig. 1.

Fig. 1 Bivariate Cholesky ACE decomposition including rater bias. “A” represents the genetic influences. The common environmental (C) and unique environmental (E) influences are not shown to avoid clutter (but can be found in Fig. S1 in the Supplementary Materials). “r zygosity ” is 1 for MZ twins and 0.5 for DZ twins. “r rater ” represents the correlation between the raters of the twin, which is 1 for twins rated by the same teacher and 0 for twins rated by different teachers. “a11” represents the genetic influences on victimization, “a12” represents the genetic covariance between victimization and perpetration, and “a22” represents the unique genetic influences on perpetration after accounting for the shared genetic influences. This model was fitted to each type of perpetration/victimization pair Full size image

We assumed that raters may introduce systematic variation into the phenotype ratings, which reflect for example differences in raters’ visions of bullying. In addition, raters who assess multiple children, can cause possible rater contrast effects. More specifically, the twins in our dataset that attend the same classroom were assessed by the same teacher. This might result in more similar bullying estimates than when the twin children were assessed by different teachers, here termed as rater effects (Bartels et al. 2007; Rietveld et al. 2003a). As shown in Fig. 1, we included in the model a rater effect to accommodate this variation. The rater effects are assumed to contribute to the covariance between phenotypes within twin members. If the twins are rated by the same teacher (i.e., twins in the same class), the rater effect may also contribute to the phenotypic covariance between twins.

We used Mplus version 7 to fit the ACE twin model (Muthén and Muthén 1998–2012). As the data are ordinal, we used robust weighted least squared (WLSMV) estimation applied to the tetrachoric or polychoric correlation matrices. This is consistent with the liability-threshold modeling (e.g., Rijsdijk and Sham 2002), in which the ordinal data arise from the discretization of bivariate normal (latent) liabilities. The phenotypic summary statistics are the thresholds and the tetrachoric or polychoric correlation matrices. The correlations convey the linear association at the level of the liabilities, and the thresholds convey the frequencies of the responses. The model included 5 × 2 groups. First, five groups were based on zygosity and sex (MZ males, DZ males, MZ females, DZ females and DZ opposite-sex pairs). Given the five groups, we could test sex differences in the variance components and the thresholds. Second, each group was further divided into “same-class” and “different-class” groups (hence the 5 × 2 groups). The latter subdivision was made to accommodate the rater effects (see Fig. 1), which are shared by twins in the same class (and so the same teacher rater).

In sum, in the full model the bivariate phenotypic covariance matrix was decomposed into ACE components and the rater-variance component. The 10-group model allowed us to use χ2 difference testing to study sex differences in thresholds and variance components, while accounting for the rater effect. We carried out the bivariate analyses (perpetration–victimization) separately for each form of bullying (general, verbal, physical and relational).

We tested the sex and classroom effects on the genetic and environmental variance components (these tests are also represented in Table S1 in the Supplementary Materials Online). First, we tested whether the thresholds (i.e., the prevalences) depended on class sharing and gender. Second, we tested whether the ACE components varied with classroom sharing and with gender. Because twin correlations (Tables S2–S5) were consistent with an ACE pattern rather than an ADE pattern, dominance effects (D) were not investigated. This is also consistent with the previous literature on bullying that found C but no D influences (e.g. Ball et al. 2008; Connolly and Beaver 2016). The variances of MZ and DZ twins were similar, and hence we did not consider social interaction effects between twins (Eaves 1976; Rietveld et al. 2003b).

Model fit evaluation was based on the Chi squared test, the Comparative Fit Index (CFI) and the Root Mean Square Error of Approximation (RMSEA) (Kline 2011). The Chi squared test is based on the difference between the observed and expected covariance matrices, with a better fit indicated by Chi square values closer to zero. Model evaluation was based on the combinational rule of Chi squared p-values > 0.05, CFI > 0.95, and RMSEA < 0.05. Comparison of a model with a reduced model was based on χ2 difference testing. To accommodate multiple testing, we used an adjusted α-value of 0.01. The data were prepared in R, version 3.4.1 (R Core Team 2017) and all models were fitted in Mplus, version 7 (Muthén and Muthén 1998–2012).