Sample

The Growing Up Today Study (GUTS) is a US longitudinal cohort of 16,882 offspring of women participating in the Nurses’ Health Study II, enrolled in 1996 at ages 9–14 years and followed annually or biennially43. In 2010, male participants were asked whether they would be willing to donate a semen sample. Nearly two-thirds (64%) were willing. Age, BMI, and race did not differ between men willing and unwilling to donate. In 2012, we contacted 66 men to request a sample; 54 men (82%) returned the sample by mail. We further invited the first 28 men who returned the sample to send a second one; 24 men (86%) returned the second sample. Men were asked to abstain from ejaculation for at least 48 h prior to producing the sample by masturbation into a collection container (Thermo Scientific Nalgene Jars). Samples were shipped overnight, with four gel refrigerant packs surrounding the sample, to the Massachusetts General Hospital Fertility Center where sperm concentration and morphology were measured. Remaining semen was aliquoted and flash frozen in liquid nitrogen. Informed consent was obtained from all participants. The Institutional Review Board of Partners Healthcare approved this study.

We conducted DNAm assays on 48 samples from 34 men. Of these, 20 men contributed single samples, 12 men contributed two samples each, produced ~3 months apart, and two men’s samples were assayed twice as technical replicates, for a total of 48 samples assayed. We oversampled men who had been exposed to high levels of abuse, such that the samples that were assayed included 17 men exposed to high, 5 men to medium, and 12 men to no childhood abuse.

Measures

Experiences of physical, emotional, and sexual abuse before age 18 were measured in 2007 when participants were aged 18–23 years. Physical and emotional abuse were measured with four items from the Childhood Trauma Questionnaire (CTQ), querying frequency that an adult in the family yelled, insulted, punished cruelly, and hit so hard that it left bruises44. Responses to the CTQ were summed44 and then divided into quartiles based on their distribution in the entire cohort (lowest quartile = 0 points, highest quartile = 3 points). Physical and emotional abuse were also measured with three items from the Conflict Tactics Scales (CTS), querying frequency that an adult in the family shoved; threatened to punch, kick, or hit with something; actually punched, kicked, or hit with something; or physically attacked45. Response options for the CTQ and the CTS ranged from “never” to “very often”. Responses to the CTS were skewed, with most respondents reporting none of these experiences. We therefore divided this scale into 0: lowest 50%, 1: next 25%, and 2: highest 25%.

Sexual abuse was queried in each time period with two questions regarding unwanted sexual experiences with an adult or older child (e.g., “Did an adult or an older child force you into any sexual activity by threatening you or hurting you in some way?”)45. Response options included: no; once; or > once.

To oversample men exposed to high levels of abuse, we created an overall measure of childhood abuse in three levels: none, moderate, and high. Men with “no abuse” (N = 12) were in the lowest category of both measures of physical and emotional abuse and had not experienced sexual abuse. Respondents with “high abuse” (N = 17) were either in the highest level of the CTS or the highest level of the CTQ, or had a mixture of elevated responses across both questionnaires. All or nearly all men in this group had experienced punishments that seemed cruel, were yelled and screamed at, and had hurtful and insulting things said to them. All had been shoved, grabbed, hit, or physically attacked in some other way, and most had also been threatened with violence. Two men in this group had been sexually abused. Five participants fell between the “no abuse” and “high abuse” groups and were considered to have experienced “medium” abuse. We also summed the CTQ, CTS, and sexual abuse measures to create a continuous measure of abuse severity (range, 0–7) and dichotomized participants as none-to-medium (0–2) vs. high abuse3,4,5,6,7.

Covariates

We examined the characteristics of the semen sample, including ejaculate volume, sperm concentration, percent normal morphology, collection date, collection time, and abstinence interval, as well as characteristics of the participant, including age at collection, month of birth, and race/ethnicity as possible covariates. Additionally, we included information reported by the participants’ mothers, Nurses’ Health Study II cohort members, regarding her ancestry as well as participants’ childhood socioeconomic status, an index of family income, maternal social standing, and paternal education, reported in 1999–2001.

Hypothesized mediators

Childhood abuse increases risk for adulthood health risk behaviors, mental disorders46,47, and trauma exposure42, factors that may explain an association of childhood abuse with adulthood sperm DNAm. We examined smoking, BMI (by self-report in 2010 and 2007), depressive symptoms (measured with the Center for Epidemiologic Studies Depression Scale-1048 in 2010), posttraumatic stress symptoms (measured with the 7-item Short Screening Scale for DSM-PTSD49 in 2007), and trauma exposure (measured in 2007 with 13 items adapted from the Brief Trauma Questionnaire50 e.g., physical assault, intimate partner violence, and serious illness) as potential mediators.

DNAm assay

A differential lysis method involving a series of six washes was performed to separate sperm cells from epithelial and round cells (Supplemental Material). We then conducted DNAm assays with Infinium HumanMethylation450 (450 K) BeadChips (Illumina) using bisulfite-treated DNA (EZ-96 DNA Methylation kit, Zymo Research, Irvine, CA). These assays produce 485,577 data points encompassing 482,421 CpG sites and 3091 CpN sites. Raw intensity scores were color corrected and background was subtracted using GenomeStudio Software (Illumina). Methylation β value for each probe represents a continuous ratio between 0 (0% methylated) and 1 (100%). Probes were excluded from further analysis if they had a detection p-value < 0.01 (n = 2144 probes) or if > 5% of samples were missing a β value (n = 12,353 probes). Probes which bound in silico to the X and Y chromosome in addition to the specified targets were excluded51, leaving N = 439,746 probes available for subsequent analysis. Inter-sample normalization was performed using quantile normalization52. To account for the two probe types on the Illumina BeadChip, normalization was performed using subset-quantile within array normalization (SWAN)53. To determine if there were batch effects, PCA was performed on the normalized data followed by Spearman’s correlations of the PCs with all technical variables. A slight batch effect associated with chip number and position was removed using empirical Bayes methods (R package SVA, ComBat function, Supplementary Figure S1)54.

To evaluate the purity of our washed sperm samples, we compared DNAm in our sample with DNAm from an independent study of contaminated and purified sperm samples (Gene Expression Omnibus (GEO)55 GSE108058, Supplementary Figure S2). We merged the GEO dataset with our own data and performed PCA. The vast majority of variation in methylation is associated with tissue heterogeneity, therefore the first few PCs should be correlated with the purity of the semen samples. Plotting PC1 against PC2 (for visualization purposes), our samples clustered with the pure semen, providing evidence that we had successfully purified our samples. We additionally examined the methylation status of two imprinting control regions (HYMAI and GNAS-AS). These regions are paternally expressed, and therefore we would anticipate that these regions would be fully unmethylated if our samples contained purified haploid gametes (as opposed to hemi-methylated in somatic tissue). We calculated the median DNAm β value for each probe underlying these regions (130 probes) for each sample in our study. The vast majority of samples had median β < 0.05, suggesting good purity (Supplementary Table S1, Supplementary Figure S3).

Analyses

To characterize the study sample, we compared age, race, and abuse exposure of study participants with all GUTS men. Next, for study participants, we calculated prevalence for categorical variables and mean for continuous variables for covariates by childhood abuse status.

Principal components analysis

To investigate whether childhood abuse and our covariates were associated with variation in DNAm, we conducted PCA with all probes (N = 439,746) using one randomly selected sample per subject. PCA reduces the dimensionality of the data by identifying orthogonal components from methylation values of all individual probes, with PC1 explaining the most variance. We examined the association of both the continuous and categorical childhood abuse variables and the covariates with centered PCs, using one-way ANOVAs for ordinal and categorical variables and Spearman’s correlations for continuous variables. For PCs that were statistically significantly associated with childhood abuse, we investigated which specific probes contributed most to the PC by first identifying individual probes with the largest PC score (the 1% of probes with the largest positive scores and 1% with the largest negative scores) and then, to increase the likelihood of biological relevance, selected only probes with methylation Δβ ≥ 5%, where \(\Delta \beta = \bar \beta\) high abuse –\(\bar \beta\) no abuse . P-values were not adjusted for multiple testing, as this was an exploratory analysis to determine associations with DNAm.

DMRs analysis

We next investigated whether childhood abuse was associated with patterns of DNAm in spatially clustered probes. We investigated DMRs by childhood abuse exposure using the R package DMRcate56 (Bioconductor, http://www.bioconductor.org), using the same randomly selected sample per subject used in the PCA. DMRcate first assesses the association of the exposure (childhood abuse) with methylation at each individual CpG site, then groups the probes into DMRs based on the similarity of effect size and directionality with distances of ≤ 1000 bp between them. DMRs are then corrected for multiple testing by calculating the false discovery rate (FDR) for each DMR. DMRs that do not meet an FDR ≤ 0.05 and a fold change ≥ 0.05 are dropped. We considered regions to be DMRs if they were statistically significant at an FDR ≤ 0.05, contained ≥ 3 probes, and had a difference in DNAm β (Δβ) ≥ 5%, where Δβ = \(\bar \beta\) high abuse –\(\bar \beta\) no abuse . We conducted these analyses with the ordinal childhood abuse variable to reduce the effects of outliers, then checked that results were similar in analyses using the continuous childhood abuse variable. We verified our findings by replacing the sample used in the primary analyses with the replicate sample from each man who contributed two samples (N = 12) and re-running the DMR analyses using original samples from 22 men and replicate samples from 12 men. Finally, in sites located in identified DMRs, we calculated the interclass correlation coefficient (ICC) between the first and second sample in DNAm β values.

To examine the concordance of our two methods of identifying probes differentially methylated by childhood abuse, we compared the overlap in probes identified using PCA and probes identified in DMR analysis.

Machine learning analysis

Finally, we used machine learning to identify sites predictive of childhood abuse and: (1) compare them with the sites identified in the DMR analysis and (2) construct a parsimonious predictor of child abuse status. We fit a penalized linear regression (“elastic net”) to select informative probes from the set of all probes using the dichotomized childhood abuse variable (none/medium vs. high abuse, mixing parameter α set to 0.5, the default). The penalized regression begins by fitting a single linear model including all probes, then selects a subset of relevant probes by shrinking the linear coefficients and setting to zero coefficients below a given threshold57. The selected probes are those with non-zero coefficients. We estimated the penalty parameter λ with tenfold cross-validation and set it to 0.095. We applied the resulting predictor to three independent datasets (Gene Expression Omnibus55 GSE108058, GSE10297058, and GSE6409659) to ascertain whether the prevalence of abuse estimated with this predictor was approximately the same as the prevalence in the whole GUTS cohort (high abuse prevalence = 28.8%). As no datasets of sperm DNAm were available with childhood abuse measured, we could not test its ability to predict abuse status.

Pyrosequencing methylation confirmation

To confirm findings from the 450 K array, we performed pyrosequencing with bisulfite-converted DNA. We selected five sites for confirmation, prioritizing sites within DMRs and sites with low FDR. We calculated Spearman correlations between β values obtained from pyrosequencing and the 450 K array and performed linear regression to ascertain the association of pyrosequencing β values with childhood abuse.

Exploratory mediation analysis

To examine whether adulthood health risk factors might explain a possible association between childhood abuse and DNAm, we conducted two analyses. First, we examined whether these risk factors loaded on DNAm PCs, using one-way ANOVAs for ordinal variables and Spearman’s correlations for continuous variables. Next, we examined probes identified in DMR analyses. For each probe in a childhood abuse DMR, we compared the association of childhood abuse with DNAm in linear models adjusted only for age and semen volume (base model) and in models further adjusted for: (1) health risk behaviors (smoking and BMI); (2) mental health (depressive and posttraumatic stress symptoms); and (3) trauma exposure. We calculated % mediation as: [(β child abuse, base model –β child abuse, adjusted model )/β child abuse, base model ]*100 for each probe and calculated the mean mediation across all probes within each DMR for each set of hypothesized mediators. We did not include all hypothesized mediators in a single model to avoid overfitting.

Probes associated with childhood abuse in prior studies

We examined the association of 1667 probes previously identified as associated with childhood abuse11,14,15,16. We considered probes with FDR < 0.05 as statistically significant, accounting for multiple testing within this set of 1667 probes.

Code availability

Code is available at GitHub60.