Data

Study Population

We included data from 4 Swedish Registers: Medical Birth Register, National School Register, Multi-Generation Register, and National Patient Register. Each resident of Sweden is given a personal identification number (PIN) which is the same in each of these registers, and can be used to link data across registers. The Medical Birth Register was established in 1973 and includes data on over 98% of all births in Sweden [15]. For our cohort, we included all live singleton births in the Swedish Medical Birth Register that occurred between 1982 and 1995. As both mode of delivery and school performance are likely to be highly correlated in multiple births, these were excluded. Variables detailing the timing of onset of labour and CS are available from 1982, thus marking the beginning of our cohort births. Ethical approval was obtained from the regional ethical research committee of Stockholm at Karolinska Institutet. Informed consent was waived by the ethics committee.

Exposure-obstetric mode of delivery

Obstetric mode of delivery, extracted from the Medical Birth Register, consisted of “unassisted vaginal delivery VD,” “assisted VD,” “elective CS” and “emergency CS”. Unassisted VD was defined as VD without the use of forceps or vacuum extraction, and assisted VD was VD with the use of forceps or vacuum extraction. Unassisted and assisted VD included both spontaneous and induced VD. Elective CS was defined as CS which started before onset of labour (as indicated on medical charts by water departure, bleeding or regular and sustained pain) and emergency CS was defined as CS which started after onset of labour.

Outcome-school performance

Data on school performance were extracted from the National School Register, which are available beginning in 1988. In Sweden, upon finishing the compulsory years of school (age 16), grades in 16 subjects are recorded. Starting in 1998, these grades were categorised into 4 levels for each subject: not pass (score of 0), pass (score of 10), pass with distinction (score of 15), and pass with great distinction (score of 20). This allowed for a maximum total score of 320 (i.e. a score of 20 in each of the 16 subjects). Prior to 1998 there was a different grading system, but as the oldest children in our cohort turned 16 in 1998 only the current method was included. Children that “drop out” of school before compulsory grading still technically graduate but are recorded as having received a total of 0 for their final grade, and are not able to continue on to high school. These children were included in our population and were recorded as having a total score of 0. Scores were assessed in both categorical and continuous (from 0 to 320 in jumps of 5) form [16, 17].

Co-variates

Based on previous literature and the use of a directed acyclic graph (DAG) [18] (Additional file 1: Figure S1), the following a priori co-variates were included in the analysis: maternal age at time of birth (<25 years, 25–34 years, 35–44 years, 45+ years), [16] birth order (first born), [16] small for gestational age (SGA), [16] large for gestational age (LGA) (defined as birth weight less or greater than 2 standard deviations from the mean for gestational age, respectively), gestational age (<37 weeks, 37, 38, 39, or 40 weeks, >40 weeks), [16] maternal country of birth (Swedish, other Nordic, other), [19] maternal depression, non-affective disorder, or bipolar disorder (never diagnosed, diagnosed before birth, diagnosed after birth), parental income at time of birth (in quintiles), and parental social welfare at time of birth (yes/no, note: available from 1983), [19] and parental highest education (pre-high school, high school, post-high school) [16, 19, 20].

Though not identified as confounders in the DAG, further co-variates that were identified based on previous literature were also assessed, including: year of birth, [20] year of school completion, smoking at time of first antenatal visit (none, 1–9 cigarettes/day, 10+ cigarettes/day), [16, 20] infant gender, [16, 19] Apgar score at 5 min (“low” [0–3], “intermediate” [4–6], “high” [7–10]), [16, 21] paternal country of birth (Swedish, other Nordic, other), [19] paternal depression, non-affective disorder, and bipolar disorder (never diagnosed, diagnosed before birth, diagnosed after birth), parental co-habitation at time of birth [19, 20].

All co-variates were tested individually in the logistic regression analysis to assess the potential impact on the association between mode of delivery and school performance. As no variable changed the estimate by more than 10% (only maternal age changed the estimate by more than 5%), only the variables decided on a priori were included in final analysis. Notably, parental education was considered an a priori variable, but was only available from 1990, and thus though it was tested individually, and as it had no impact on the estimate, was not included in the model. Distribution of each variable by mode of delivery is outlined in Table 1.

Table 1 Distribution of descriptive variables by mode of delivery Full size table

Statistical analysis

Logistic regression

For the logistic regression, we considered “poor school performance” to be a total score of less than160, [16, 17, 22] meaning the individual did not have an average of at least 10 (i.e. “pass”) for the 16 subjects. In Sweden, scores are assigned by teachers rather than a standardised test, and thus standards for a particular grade could vary school-to-school. To account for this, we used mixed effects modelling with a random intercept for school ID.

Quantile regression

The data on school performance have been previously reported to be highly skewed [16, 20]. We used quantile regression to analyse school performance in its continuous form. Quantile regression is similar to an ordinary least squares (OLS) model, except the model regresses on the quantile of interest (such as the median), instead of the mean. Quantile regression also does not require an assumption of normality or equal variance, and allows for assessment across the distribution (i.e. at every quantile). In this way we were able to determine if there was an effect of mode of delivery across the distribution of scores (for example, a possible effect only on the high or low scores), rather than an effect only on passing scores as seen with logistic regression. We plotted quantile regression coefficients for every fifth quantile from the 5th to the 95th using the kernel-based method for estimating standard errors [23]. We also looked at coefficient estimates for specifically the 5th, 25th, 50th, 75th, and 95th quantiles. In adjusted analysis we included the same co-variates as the fully adjusted logistic regression model.

Additional analyses

We conducted several sub-group analyses. In the logistic regression, we restricted to births from 1990 onwards (the year data on parental education became available), and assessed the association with and without adjustment for parental education. We assessed the association only among male babies. We also excluded children born through a secondary CS (children born by CS whose mothers’ had previously given birth through CS), and children with a low Apgar score at 5 min. Though the vast majority of the population finishes compulsory years of school at age 16 (95%), there are some students who finish younger or older. To that end, we also restricted the population to those who were 16 at the time they finished compulsory school to determine what effect age may have had on school performance. To account for potential clustering of academic performance within families, we restricted the population to one-child families and first born children. We then repeated overall analysis with a random intercept for maternal ID instead of school ID. For both logistic and quantile regression we conducted sensitivity analyses by excluding children who received a “0” as a grade (i.e. children who did not complete the compulsory years of schooling).

Additionally, we conducted logistic regression to assess the association between birth by CS and school performance in five subject categories: [12] natural sciences (biology, chemistry and physics), social sciences (geography, religion, history and society knowledge), arts (art and handicraft), sports, and Swedish. An average below “pass” (10 points per subject) was considered poor performance in each category. Similar to overall school performance analysis, sub-group analyses were conducted where children who were recorded as a “0” in any subject were excluded from that category.

The logistic regression analysis was conducted in SAS v9.3 (Cary, N.C) using PROC GLIMMIX [24] and quantile regression analysis was conducted in R v3.2.2 using the QUANTREG package [23]. Missing data were addressed using the missing indicator method, with a category for each variable used to indicate “missing” status [25].