Subjects

Subjects were 1,432 African American and Hispanic American children, ages 8 to 15 years. This study was part of a larger, multi-site US and Canadian collaborative Genes, Reading, and Dyslexia (GRaD) project led by Yale University (PI: JR Gruen) that followed a case:control design. Inclusion criteria for cases with RD were either history of poor reading skills, reports of skills falling below expected levels for age or grade, and/or provision of special services in the area of reading. Inclusion criteria for controls were reading skills above current expectations for grade, and performance above the 40th percentile on standardized school or clinical testing. Exclusion criteria were age outside the target age range; non-minority ethnic or racial group membership; foster care placement; preterm birth (<36 weeks gestation); prolonged stay in the NICU after birth (>5 days); history of diagnosed or suspected significant cognitive delays, significant behavioral problems, or frequent school absences; history of serious emotional/psychiatric disturbances (i.e., major depression, psychotic or pervasive developmental disorder, Autism) or a chronic neurologic condition (i.e., seizure disorder, developmental neurological conditions, Tourette’s or other tic disorders, acquired brain injuries); and documented vision or hearing impairment. Parental consent forms and child assent were collected before participation. This study was approved by the Human Investigation Committee of Yale University and all of the review boards of participating data collection sites.

Population stratification correction

A principal components analysis on genome-wide genotyped single nucleotide polymorphisms (SNPs) was conducted to model continuous axes of genetic variation within the GRaD sample using EIGENSTRAT.60 Prior to analysis, SNP level quality control included the removal of SNPs that met the following criteria: missingness greater than 5%, Hardy-Weinberg equilibrium p < 0.0001, not autosomal, or a minor allele frequency less than 0.05. Sample level quality control included the removal of samples with percentage of missing genotypes greater than 3% and samples with discrepancies between reported and inferred sex based on X chromosome heterozygosity. The first 10 principal components were used to correct for genomic inflation due to allele frequency differences across different ancestries. Analyses of genome-wide SNP data in the GRaD study found that the first 10 principal components were effective in reducing the effects of population stratification (genomic inflation factor <1.05) in the phenotypes evaluated in the present analysis.61

Neuroimaging

Neuroimaging data was collected from 100 subjects who were randomly recruited with parents’ consent in the GRaD study. Of these, data from 58 subjects (27 female; ages 8.0 to 15.75) passed quality control procedures for resting state fMRI (i.e., having at least 60% of retained images across two runs, after censoring volumes which exceeded the thresholds of 0.3 mm point-to-point Euclidean movement and/or 10% outlier voxels; one subject was also removed from analysis because of having a dual deletion for READ1). The 42 participants who were excluded for data quality purposes were significantly younger than the included participants, and consequently had lower raw scores on Woodcock-Johnson III Letter-Word Identification and SRI Passage Comprehension. Importantly, however, the included and excluded participants did not significantly differ in age-normed standard scores for any test. The included and excluded participants also did not differ in the comprehension residual measure, nor did they differ in ancestry, SES, or in the proportion of participants with RU2Short versus non-RU2Short READ1 alleles. Therefore, we are confident that the reported results are not biased by the exclusion of participants for data quality purposes. Euclidean movement was calculated per volume as the square root of the sum of squares of point-to-point change for each of the six motion parameters (i.e., three translation and three rotation). Each child was assigned to one of two groups depending on READ1 genotype (Table 6): RU2Short, or non-RU2Short.

Table 6 Demographic characteristics concerning the 58 subjects (27 female) who had usable resting state fMRI data Full size table

Demographic characteristics for the 58 subjects with usable resting state fMRI data are presented in Table 6. For each subject, comprehension residual values were calculated by regressing reading comprehension scores onto age, performance IQ, word decoding, and word recognition. Comprehension residuals ranged from −33 to 27 (M = .995; SD = 11.2).

Neuropsychological measures

Reading outcome assessments were standardized measures including Woodcock-Johnson III – Letter-Word Identification and Word Attack,62 Test of Word Reading Efficiency – Sight Word Efficiency and Phonemic Decoding Efficiency (TOWRE),63 and Standardized Reading Inventory – Word Recognition and Reading Comprehension (SRI).64 Core reading-related skill measures included well-validated instruments tapping phonological awareness assessed by the Comprehensive Test of Phonological Processing – Elision and Blending (CTOPP),65 naming speed assessed by the Rapid Automatized Naming – Letters, Numbers, and Objects (RAN),66 receptive vocabulary assessed by the Peabody Picture Vocabulary Test (PPVT)67 and the Wechsler Intelligence Scale for Children – Vocabulary (WISC).68 Performance IQ was measured by the Wechsler Intelligence Scale for Children – Matrix Reasoning (WISC).68

Woodcock–Johnson Tests of Achievement, third edition (WJ-III)

Measures of interest from the WJ-III included the Letter-Word Identification and Word Attack subtests.62 The WJ-III Letter Word Identification subtest is an untimed measure of non-contextual single word reading ability requiring the child to read a list of increasingly complex English words aloud. The Word Attack subtest asks the subject to apply knowledge of English phonology to decode non-words or pseudowords in isolation. The total score for each subtest represents the number of words read correctly, converted to a standard score based upon age norms.

Test of Word Reading Efficiency (TOWRE)

The TOWRE is a fluency assessment of single word reading and single pseudoword decoding under timed conditions.63 The subject is asked to read as many individual words (Sight Word Efficiency) or non-words (Phonemic Decoding Efficiency) of increasing length and phonetic difficulty as possible in 45 s. Scores for Sight Word Efficiency and Phonemic Decoding Efficiency represent the number of correctly read words within the time limit, relative to age norms.

Standardized Reading Inventory, second edition (SRI)

The SRI is an individually-administered contextual reading test that consists of 10 passages of increasing difficulty, ranging from pre-primer to an eighth grade level.64 Oral reading accuracy is assessed and subjects are then asked to answer a series of comprehension questions. Scores are obtained for word recognition accuracy and comprehension on each passage; the total score in each skill area is converted to a norm-referenced standardized score.

Comprehensive Test of Phonological Processing (CTOPP)

CTOPP is a comprehensive instrument designed to assess phonological awareness, phonological memory, and rapid naming.65 In the current study, Elision and Blending Words subtests were used to measure phonological awareness. Elision measures the ability to remove phonological segments from spoken words to form other words. Blending Words measures the ability to synthesize sounds to form words.

Rapid Automatized Naming and Rapid Alternating Stimulus tests (RAN)

The Letters, Digits, and Objects subtests from the RAN assesses speeded lexical retrieval, requiring the rapid naming of a series of letters, digits, and objects (repeated randomly in 5 rows of 10 items) as quickly as possible without making mistakes.66 Time to completion is recorded and converted to an age-referenced standard score.

Peabody Picture Vocabulary Test, fourth edition (PPVT-4)

The PPVT-4 is an untimed measure of receptive vocabulary knowledge; the subject is required to point to one of four pictures that best indicates the target word presented.67 Total raw scores (correct responses) are converted to age-normed standardized scores.

Wechsler Intelligence Scale for Children, fourth edition (WISC-IV)

The WISC-IV Vocabulary measures verbal fluency, concept information, word knowledge, and word usage.68 Subjects are asked to give oral definitions of words. Scoring was 2-1-0, according to the quality of the responses. The WISC-IV Matrix Reasoning measures performance IQ.

DNA collection and genotyping

DNA was collected from saliva using Oragene-DNA kits (DNA Genotek Inc.) and extracted with prepIT-L2P (DNA Genotek Inc.) using manufacturer protocols. READ1 genotyping was conducted using PCR amplification and purified of PCR reagents. Sanger sequencing was performed at the Yale W.M. Keck DNA Sequencing Facility using standard protocols. Primer sequences and amplification protocols are described in detail elsewhere.30 READ1 genotypes were called from chromatograms using a custom program written in C + + (Dr. Yong Kong, available upon request). If the calling program identified errors, chromatograms were manually examined and deconvoluted for genotype calls. The genotype call rate for READ1 alleles in the GRaD sample was 0.987.

Allele-specific PCR was used to genotype the 2445 bp microdeletion in intron 2 of DCDC2, which encompasses READ1. Primer sequences and amplification protocols for microdeletion genotyping are described in detail elsewhere.30 The deletion genotype call rate was 0.972 in the GRaD sample.

Functional groups of READ1 alleles

READ1 is composed of seven repeat units; however, grouping of alleles was based on variation in the first two repeat units. READ1 alleles were assigned to one of three groups (Table 7). RU1-1 alleles have only one copy of Repeat Unit 1 (RU1-1: alleles 2, 3, 9, 12, 25, 27). RU1 sequence was previously used to capture ETV6, a dimerizing transcriptional repressor that binds to a consensus binding site, GGAAG, which is present in the RU1 site: GAGAGGAAGGAAA. The RU1-1 group of alleles was associated with a moderate protective effect on reading performance in the Avon Longitudinal Study of Parents and Children (ALSPaC), a longitudinal cohort of European descent.32 RU1 also contains a consensus binding site (GAGAGGAAGGAaagg) for many C2H2 domains that define classical zinc finger transcription factors.

Table 7 READ1 allele groups and previously reported associations Full size table

RU2Long alleles have two copies of RU1 and greater than seven copies of Repeat Unit 2 (RU2: alleles 5, 6, 13, 14, 19, 20, 22, 23; Table 7). RU2 (GGAA) also contains consensus-binding sites for ETV6. In the ALSPaC, allele 5 was associated with increased risk for RD, whereas allele 6 was associated with increased risk for LI.32

RU2Short is characterized by alleles that have fewer than six copies of RU2 (4, 10, 15, 16, 17, 21, 24, 26, 30, 37, or 39). The most frequent alleles in this category are alleles 4 and 10, which have not shown association with either a positive or negative impact on reading performance in our previous studies with the ALSPaC (see Table 7). Note that because of this coding scheme, children with one copy of the deletion of READ1 were essentially treated as being homozygous for the other allele.

Magnetic resonance imaging data

Images were collected using a single 3 T TIM-Trio scanner located at the Yale Magnetic Resonance Research Center in New Haven, CT. Anatomical images were acquired in a sagittal orientation (MPRAGE; matrix size = 256 × 256; voxel size = 1 × 1 × 1 mm; FoV = 256 mm; TR = 2530 ms; TE = 2.77 ms; flip angle = 7°). T2*-weighted images were collected in an axial oblique orientation (25 slices; 6 mm slice thickness; no gap) using single-shot echo-planar imaging (matrix size = 64 × 64; voxel size = 3.44 × 3.44 × 6 mm; FoV = 220 mm; TR = 1550 ms; TE = 30 ms; flip angle = 80°). Subjects were asked to keep their eyes open to the best of their ability, but were told that they could blink if needed. Two runs of resting state fMRI were administered, each 6:18 in length (240 volumes), for a total of 12:36 (480 volumes). Within each run, the first six volumes were removed to allow for stabilization of the magnetic field.

Preprocessing of MRI data

Resting state functional images were preprocessed in AFNI69 by first performing despiking (3dDespike) and slice-scan time correction (3dTshift). Next, functional images were co-registered with anatomic images, were warped to the Talairach template using a non-linear transform (3dQWarp), and were corrected for motion by alignment to the volume with the minimum outlier fraction (3dvolreg). These three steps were concatenated into a single transform that also forced a 3 mm isotropic voxel size on the data. Data were then smoothed using a 6 mm FWHM Gaussian kernel (3dmerge). Next, a general linear model (GLM) analysis was performed (3dDeconvolve) using regressors for the six motion parameters as well as their derivatives, and regressors for low frequency components less than 0.01 Hz. In this GLM, volumes were censored if they exceeded the threshold of 0.3 mm point-to-point Euclidean movement and/or 10% outliers. Following this, tissue-based regression was performed to regress out the first three principal components of each subject’s eroded lateral ventricle mask as well as the average signal in each subject’s eroded white matter mask. The latter was performed using the “fast” ANATICOR program in AFNI.70 White matter and lateral ventricle masks were created for each subject by segmenting and parcellating anatomical scans in Freesurfer71 using the Desikan-Killiany atlas.72

Defining seeds using the intrinsic connectivity distribution

Using custom scripts written in R, which were adapted from Scheinost et al. (2012),73 intrinsic connectivity distribution (ICD) maps were created for each subject by assessing voxelwise connectivity in the residual maps from the GLM. For each voxel, a histogram was generated that characterized the density of correlations to every other voxel in the brain. These histograms were approximated using a stretch exponential function. Two parameters characterizing this function, alpha and beta, respectively index the scale and shape of the distribution. Smaller values of the alpha parameter and larger values of the beta parameter both indicate a larger degree metric, or more extensive connectivity, at any correlation threshold. The package oro.nifti74 was used to read and write NIfTI files. ICD values were only generated for voxels within a gray matter mask, which included cortical and subcortical structures as well as the brainstem.

To define seeds for functional connectivity analyses, we performed a whole-brain analysis on subject-wise ICD parameters. To facilitate group comparison, ICD maps were first smoothed with a 6 mm FWHM Gaussian kernel (3dmerge). Groupwise analysis was performed in AFNI using a multivariate model (3dMVM) with ICD parameters as the dependent variable and comprehension residuals (reading comprehension regressed onto age, performance IQ, word decoding, and word recognition) as the independent variable. This model also included the following covariates: sex, number of years of maternal education, SES, and the first three genotype principal components. Separate models were performed with ICD alpha and beta parameters as respective dependent variables; results reported here are for the beta parameter. For these groupwise analyses, the voxelwise threshold was p = .001, cluster corrected at p = .05. Cluster correction was performed by estimating the spatial autocorrelation of the residual time course for each subject (3dFWHMx), and then using mean parameter values across subjects as inputs to the simulation program 3dClustSim (10,000 iterations). This procedure was based on the latest recommendations for cluster correction in fMRI.75 For this simulation, the size of the search space was constrained to the number of voxels in the gray matter mask for which ICD measures were calculated.

Assessing the association between READ1 genotypes and functional connectivity

Next, based on the seed region that showed significant associations between ICD parameters and the comprehension phenotype, we defined a resting state network. To do this, we constructed a map detailing the strength of the correlation between each voxel’s time course and the time course of the seed region (3dDeconvolve, followed by 3dcalc to convert R2 values to R values). The arctanh transform was then applied to the data to convert correlation coefficients to z-scores (3dcalc). Network maps were then subjected to a groupwise t-test (3dttest++; p < 1 × 10−20).

Within this network, we identified peak nodes whose resting state time courses were highly correlated with the seed time course (3dExtrema; minimum separation distance of 30 mm, or 10 voxels). For each of these nodes, spheres were centered on peak co-ordinates (radius 6 mm, or two voxels; total volume 33.5 voxels each 3 × 3 × 3 mm in size), and the mean z-score from each subject-specific map was extracted for each sphere (3dROIstats). Then, for each node, we performed multiple regression analyses to assess the relationship between READ1 genotypes and the strength of the functional association between nodes in the network. To identify important covariates for these models, we performed backward selection using the R program ‘backward’ in the mixlm package69 with an inclusion/exclusion alpha criterion of .05 and the following initial model:

Connectivity ~ age + PIQ + sex + maternal education + SES + PC1 + PC2 + PC3

Once important covariates had been identified, we included these in final models as follows:

Connectivity ~ READ1 genotype + {important covariates identified from backward selection}.