Subjects

The study was approved by the National Bioethics Committee of Iceland and the Icelandic Data Protection Authority. Samples are from a population genetic biobank of 150,656 Icelanders established by deCODE genetics. Reproductive fitness was defined as the number of children born to individuals over 45 years. Subjects born before 1968 with matching genotypic data (N=93,720) were identified from deCODE’s nation-wide genealogy database. This contains information on year of birth, county of birth and numbers of children of Icelanders. Diagnoses of schizophrenia and bipolar disorder were assigned according to Research Diagnostic Criteria (RDC)28 through the use of the Schedule for Affective Disorders and Schizophrenia Lifetime Version (SADS-L)29. ADHD subjects were recruited from outpatient pediatric, child and adult psychiatry clinics in Iceland; ICD-10 diagnoses were made on the basis of standardized diagnostic assessments by experienced clinicians. Autism subjects were ascertained through the State Diagnostic Counseling Center and the Department of Child and Adolescent Psychiatry in Iceland and received ICD-10 diagnoses based on standardized diagnostic assessments by clinical specialists. Diagnoses of MDD were made by clinicians or based on the results of a semi-structured interview (CIDI), and were assigned according to DSM-III, ICD-9 or ICD-10 criteria. All diagnoses of recurrent depression were included (that is, mild, moderate and severe), but in the case of single episode depression mild cases were excluded. Characteristics of the sample are shown in Supplementary Table 1.

Genotyping and imputation

Genotyping was performed on Illumina HumanHap (300, 370, 610, 1 M, 2.5 M) and IlluminaOmni (670, 1 M, 2.5 M, Express) SNP arrays9. BeadStudio (Illumina; version 2.0) was used to call genotypes, normalize signal intensity data and establish the log R ratio and B allele frequency at every SNP. Long-range haplotype phasing was achieved using an iterative algorithm, which phases a single proband at a time, given the available phasing information on all other individuals who share a long haplotype identically by state with the proband30. Given the large proportion of the Icelandic population that has been chip-typed, accurate genome-wide long-range phasing is possible for all chip-typed Icelanders. For long-range phased haplotype association analysis, the genome was then partitioned into non-overlapping fixed 0.3 cm bins. Within each bin, the haplotype diversity was consistent with the combination of all chip-typed markers in the bin. The whole genomes of 8,453 Icelanders were sequenced using Illumina technology to a mean depth of at least × 10 (median × 32). SNPs and indels were identified and genotypes called using join calling with the Genome Analysis Toolkit Haplotype Caller (GATK version 3.3.0) (ref. 31). The error rate of genotype calls made solely on the basis of next generation sequence data decreases as a function of sequencing depth. Taking advantage of the fact that all the sequenced individuals had also been chip-typed and long-range phased, information about haplotype sharing was utilized to minimize the number of such errors. Thus, the genotype call in cases where sequence reads were ambiguous would be informed by comparison with sequence reads of other individuals sharing haplotypes with the individual in question at the ambiguous site. To improve genotype quality and to phase the sequencing genotypes an iterative algorithm based on the IMPUTE HMM model32 and using the long range phased haplotypes was employed33. The same principle was then used to impute the sequence variants identified in the 8,453 sequenced Icelanders into 150,656 Icelanders who had been genotyped with various Illumina SNP arrays and their genotypes phased using long-range phasing33.

Polygenic risk scoring and CNV selection

We derived PRSs from GWAS summary results available online from the Psychiatric Genomics Consortium (https://pgc.unc.edu/) for ADHD, autism, bipolar disorder, major depression and schizophrenia13,14,15,16,17. The number of cases in these studies was 896, 3,303, 7,481, 9,240 and 35,476 respectively. The deCODE sample was not part of these analyses. To compute the PRSs we used approximately 630,000 autosomal markers from a framework set of markers used in long-range haplotype phasing. The framework markers have been selected on the basis of various quality criteria including high genotype yield, Hardy–Weinberg equilibrium and consistency of allele frequencies across different Illumina array types. We estimated the linkage disequilibrium between markers using Icelandic samples and adjusted for it using LDpred34 a recently proposed method. PRSs were calculated with seven different settings of the P parameter (corresponding roughly to the fraction of causal markers34): 0.001, 0.003, 0.01, 0.03, 0.1, 0.3 and 1.0. Eleven CNVs conferring risk of schizophrenia or autism (‘neuropsychiatric CNVs’) were selected from the most recent review on CNVs in schizophrenia7 and the most recent analysis of CNVs in autism8.

Statistical analysis

PRSs were first tested for association with their corresponding psychiatric disorder in the Icelandic population (N=1,137, 692, 806, 3,246 and 631 cases for ADHD, autism, bipolar disorder, major depression and schizophrenia respectively). This was performed using logistic regression with five principal components as covariates. Models were compared against a null model including covariates only to calculate the Nagelkerke’s pseudo-R2 measure of variance explained. A linear mixed effects model was used to test the association of number of children with PRS for the five psychiatric disorders. Number of children was regressed on the PRS of interest, covarying for year of birth, sex and interaction between the two, birth county of last child or birth county of the parent, five principal components and sibship (to account for relatedness). For each respective disorder we chose the PRS calculated with a P parameter corresponding to a fraction of causal markers of 0.3 and modelled the correlation with its respective disorder and then recalibrated the PRS to have a mean of 0 and a unit increase corresponding to a doubling of risk for the disorder. All predictors were modelled as fixed effects apart from sibship which was random. This model was compared against a null model including the covariates only. To test the quadratic effects of the PRS, a PRS squared term was added and PRS was included in the null model. Sex-specific analyses were also conducted. Individuals diagnosed with each psychiatric disorder were excluded, although it may not be possible to identify every past case in a general population sample. Age at first child was tested for association with each PRS in the same manner. To examine the relationship between variance in number of children and PRSs, PRSs were split into deciles and number of children was adjusted for all covariates. A linear regression was used to test the association between deciles of PRS and residual number of children in the total sample, males and females. Neuropsychiatric CNVs were examined for association with number of children using the following covariates: year of birth, sex and interaction between the two, birth county of last child or birth county of the parent, five principal components and the random effect of sibship. Individuals with autism, schizophrenia, bipolar disorder and intellectual disability were excluded from the CNV analyses.

Data availability

Data supporting the findings of this study are available within the article and its Supplementary Information files. Summary level data from the PGC GWAS used to calculate PRS in this study were obtained from the PGC Downloads website (https://www.med.unc.edu/pgc/results-and-downloads/).