Significance A prominent hypothesis in the study of intelligence is that genetic influences on cognitive abilities are larger for children raised in more advantaged environments. Evidence to date has been mixed, with some indication that the hypothesized pattern may hold in the United States but not elsewhere. We conducted the largest study to date using matched birth and school administrative records from the socioeconomically diverse state of Florida, and we did not find evidence for the hypothesis.

Abstract Accurate understanding of environmental moderation of genetic influences is vital to advancing the science of cognitive development as well as for designing interventions. One widely reported idea is increasing genetic influence on cognition for children raised in higher socioeconomic status (SES) families, including recent proposals that the pattern is a particularly US phenomenon. We used matched birth and school records from Florida siblings and twins born in 1994–2002 to provide the largest, most population-diverse consideration of this hypothesis to date. We found no evidence of SES moderation of genetic influence on test scores, suggesting that articulating gene-environment interactions for cognition is more complex and elusive than previously supposed.

That genes and environments combine to influence cognitive development is broadly recognized, yet clear specifications of how they combine remain elusive. Behavioral geneticists have found that genetic differences are more influential on cognition for persons raised in more advantaged environments, a result sometimes called the Scarr-Rowe interaction (1). The animating idea is that social disadvantage compromises the extent to which a child’s genetic potential is realized. As a result, the ultimate influence of genetic endowment is lower in these environments, which also implies that higher heritability estimates reflect improved social conditions (2⇓–4).

Although there have been striking findings supporting the hypothesis (5⇓⇓⇓⇓–10), results are inconsistent (11⇓⇓⇓–15), and a recent meta-analysis indicated only modest support (16). Notably also, the meta-analysis found that the hypothesis fared much better in studies of US samples than samples elsewhere, mainly Northern/Western Europe and Australia.

Potential explanations of this divergence include the possibilities that socioeconomic variation in the United States is simply larger in magnitude or that the US educational system is less effective at helping disadvantaged students reach their potential (16). Child poverty rates, rates of children living in homes with deprived educational resources, and inequality in educational achievement are all higher in the United States than in countries in which null results have been reported (17, 18). If heritability of cognition reflects opportunity, then differential heritability by socioeconomic status (SES) in the United States could be interpreted as a consequence of some combination of social disparities in the United States. Of course, any such conclusion is predicated on establishing that results really are different for the United States.

Current Study This study considered the hypothesis using unique confidential population-level administrative data that match birth and public school records for all Florida children born between 1994 and 2002. Birth records were matched to school records on the basis of names, birth dates, and social security numbers. The rate of these matches is consistent with expectations from US Census data about the percentage of children born in Florida who subsequently attend Florida public schools. Our investigations indicate that rates of being able to match records for one twin but not the other are extremely low and unlikely to bias findings (19). By using state records, we were able to assemble a sample that has longitudinal test score information and is an order of magnitude larger than prior studies (24,640 twins with matched birth-school records and test score information). Our method does not depend on locating, recruiting, and retaining twins to participate in data collection efforts, which is important because this often considerably reduces the representation of twins from disadvantaged backgrounds in study samples. Having data from Florida allowed us to represent a broader range of socioeconomic backgrounds better than past work, because the state has comparable or greater socioeconomic inequality than any other population for which the hypothesis has been considered so far. In our twin sample, 25.6% of mothers were African American, 18.0% were Hispanic, 14.5% were aged ≤21 y, and 32.0% were unmarried at time of birth. Although an earlier survey-based study in Florida children found evidence of SES moderation of the heritability of achievement test scores, that sample had only 577 twin pairs and measured the SES of schools rather than families themselves (20). Twin studies in behavioral genetics typically identify the genetic contribution to an outcome’s variance using the difference in genetic relatedness between monozygotic (MZ; “identical”) and dizygotic (DZ; “fraternal”) twins. Trait heritability is then a function of the intraclass correlations (ICCs; the ratio of between-pair variance to total variance) between samples of MZ and DZ twins drawn from the same population (21). A hypothesis that SES moderates heritability implies that the relationship between ICCs differs between SES groups. Administrative data generally do not contain information on zygosity. However, because all opposite-sex (OS) twin pairs are DZ whereas same-sex (SS) twin pairs contain approximately a 50–50 mix of MZ and DZ twin pairs, SS twin pairs will, on average, be genetically more similar than OS twin pairs. One remaining complication with using administrative data is that among DZ twins, SS twins could also be more similar to one another on an outcome than OS twins for nongenetic reasons, most obviously via any consequence of being the same sex. We addressed this challenge in two ways. First, we standardized test scores within sexes, so twin similarity would be measured independently from any mean or variance differences between sexes. Second, differences between SS and OS twins were compared with differences between SS and OS siblings who are close in age but not twins. The size and representativeness of administrative data are benefits to be weighed against the limitation of not having zygosity measures. That said, studying the Scarr-Rowe interaction with data in which zygosity is known involves other assumptions, which may be more problematic than is broadly appreciated. The variance components estimated in this work are population parameters, as are terms for the moderation of these components by SES. However, the design and differential recruitment of many twin studies complicate the definition of the target population that the estimates produced by these studies are supposed to represent. For that matter, given the strong association between SES and test scores, estimated interactions have some dependence on the metrics by which SES and test scores are specified in models, but there has been little articulated reason beyond convention to favor any particular metric over others. None of this is intended to dismiss previous work; however, drawing inferences about biosocial interaction from models estimating statistical interactions is a thorny matter, involving substantial assumptions that are not simply dispelled by having data on twins’ zygosity. Instead, evidence across multiple research designs is needed (22, 23).

Results Fig. 1 shows the relationship between one measure of SES, maternal education, and scores on the Florida Comprehensive Assessment Test (FCAT) for all pairs in the sample [SI Appendix contains parallel results for the SES measure based on median income, principal components analysis (PCA)–based SES indexes, and for an alternative achievement test]. Many studies show positive relationships between these various SES measures and cognitive functioning (24⇓⇓⇓⇓–29). Children whose mothers did not finish high school were ∼0.5 SDs below the overall mean, compared with ∼0.7 SDs above the mean for children whose mothers completed college. Fig. 1. Maternal years of education and average achievement test score for combined twin and sibling pairs sample. The means of gender-standardized test scores in mathematics and reading over maternal years of education (separated by age group and test type) are shown. The sample includes all twin pairs and closely spaced sibling pairs with available test scores. A closely spaced sibling pair is defined as two siblings having the same mother for whom the distance in months between births is the smallest among births to this mother between 1994 and 2002. n = 299,426 children (24,640 twins and 274,786 singletons) and 1,796,532 children-year observations. Pair-level ICCs in Fig. 2 show that data are consistent, with substantial genetic influence on test scores. For twins, a sizable difference exists between the ICCs for SS compared with OS twins. The difference is several times larger than the corresponding difference between SS and OS nontwins, as we would expect if the primary cause of the twin difference is that approximately half of the SS twins are MZ twins. Nevertheless, we can also see that SS nontwins are more similar than OS nontwins and that OS twins are more similar than OS nontwins. This pattern implies that sex similarity and twin status have their own, albeit small, influence on pair similarity. Fig. 2. ICCs and 95% CIs in relation to math and reading achievement tests for SS-M, SS-F, and OS samples of younger (grades 3–5) and older (grades 6–8) twins and closely spaced siblings. Closely spaced sibling pair is defined as two siblings having the same mother for whom the distance in months between births is the smallest among births to this mother between 1994 and 2002. ICCs are based on multilevel mixed-effects linear regression estimated with maximum likelihood, where within-individual across-grades errors are assumed to have an autoregressive structure of order one. Random effects are structured at the twin/sibling pair and individual levels. ICC is computed as the ratio of between-pair variation to the sum of within- and between-pair variations. n = 299,426 children (24,640 twins and 274,786 singletons) and 1,796,532 children-year observations. F, female; M, male; Read, reading; Sib, sibling. Data are thus consistent, with familiar results that cognitive achievement differs over SES and that MZ twins have more similar cognitive achievement than DZ twins. To consider whether the data also provide evidence of a Scarr-Rowe interaction, we followed Turkheimer and Horn’s review of the literature, which summarizes existing evidence as a combination of two phenomena (21). We consider each in turn, as shown in Fig. 3, which presents within- and between-pair variances for SS and OS twins. Fig. 3. Estimates and 95% CIs of between- and within-pair variations for SS and OS twin pairs and achievement test scores in mathematics and reading assessed in grades 3–5 or 6–8 and split by years of maternal education. The variances were obtained using multilevel mixed-effects linear regression estimated with maximum likelihood, where within-individual across-grades errors are assumed to have an autoregressive structure of order one. Random effects are structured at the twin pair and individual levels. n = 24,640 twins and 147,828 children-year observations. First, Turkheimer and Horn indicate that “the between-pair variance of MZ pairs decreases in poor environments” (ref. 21, p. 63). Contrary to this relationship, we found that the between-pair variance of SS twins is actually lowest in the highest SES families. Given that SS twins are a relatively equal combination of MZ and DZ twins, one possibility is that a pattern supporting the hypothesis among MZ SS twins is masked by an even stronger pattern in the opposite direction among DZ SS twins. However, Fig. 3 shows that corresponding results for OS twins (all of whom are DZ) give no indication of such a pattern. Between-pair variances in achievement test scores for high-school educated parents of OS twins are higher in all cases than it is for parents without a high school diploma. Second, Turkheimer and Horn report that “the within-pair variance of MZ twin pairs increases at lower levels of SES: poverty appears to have the effect of making MZ twins more different from each other” (ref. 21, p. 61). We would therefore expect in our data that the within-pair variance for SS twins whose mother did not graduate from high school would be higher than the variance for SS twins whose mother has a high school diploma. However, this is not the case in any of the SS twin comparisons shown in Fig. 3. As before, one might be concerned that a contrary pattern among DZ twins is masking the effect, because some SS twins are DZ. However, this possibility is again contradicted by results shown in Fig. 3 for OS twins, in which the pattern of results is opposite of what would be necessary for such masking to happen. We found very similar results for alternative measures of SES (SI Appendix, Figs. S10, S18, and S22). For older children and math tests, there may be an indication of higher within-pair variance for SS twins with high school- versus college-educated mothers. This is the result closest to the expectations of the hypothesis. Even here, however, we observed that a similar pattern was present in these cases for OS twins, which strengthens the alternative possibility that the result among SS twins could be driven by DZ pairs, rather than by MZ pairs as entailed by a Scarr-Rowe interaction. Turkheimer and Horn (21) showed that the combined result of changing variance is an increasing divergence of ICCs between MZ and DZ twins, because such divergence implies higher heritability estimates. Accordingly, we would expect the difference in ICCs between SS and OS twins to diverge as SES increases. We present ICCs by maternal education in Fig. 4, and contrary to expectations from Turkheimer and Horn, we observed the opposite pattern. Fig. 4 also shows that differences are much smaller and inconsistently signed between SS and OS nontwin siblings, which contradicts the possibility that our results diverge from the hypothesis due to SES moderation of sex difference per se. We found very similar results for alternative measures of SES (SI Appendix, Figs. S11, S19, and S23). Fig. 4. Estimates of ICCs for SS and OS pairs of twins and nontwin siblings based on test scores in mathematics and reading assessed in grades 3–5 or 6–8 and split by years of maternal education. Siblings are defined as two individuals having the same mother for whom the distance in months between births is the smallest among births to this mother between 1994 and 2002. The ICCs are based on multilevel mixed-effects linear regression estimated with maximum likelihood, where within-individual across-grades errors are assumed to have an autoregressive structure of order one. Random effects are structured at the twin/sibling pair and individual levels. ICC is computed as the ratio of between-pair variation to the sum of within- and between-pair variations. n = 299,426 children (24,640 twins and 274,786 singletons) and 1,796,532 children-year observations. A different method of estimating the interaction is to adapt a model presented by Purcell (30) and used in the meta-analysis by Tucker-Drob and Bates (16). This model extends the conventional ACE model that estimates additive genetic effects (A), shared environment (C), and nonshared environment (E). The extended model includes a separate term for variance accounted for by the SES measure (M), along with terms for moderation by SES (A′, C′, E′). In these models, “A” indicates the estimated narrow-sense heritability for someone with average SES. This method is more conventional but also involves stronger metric assumptions. These results are presented in Table 1. Table 1. Modified ACE variance components model The prediction is that the moderator term for the additive genetic component (A′) would be positively signed and statistically significant. Our results over different outcomes uniformly contradict this prediction. We also tested whether our results depend on assumptions about the proportion of MZ twins in the SS twin group; however, over the range of plausible assumed values we tested (0.4–0.6), the interaction terms were uniformly signed in the wrong direction from the prediction, but not always statistically significantly so (SI Appendix, Table S3). We also explored two alternative, more continuous measures of SES based on PCA analyses. Again, the interaction terms were uniformly signed in the wrong direction and not always significant (SI Appendix, Tables S4–S7). To address the possibility that results may be confounded in one way or another by SES differences by race, we also conducted analyses in which we restricted the sample to children with white mothers (whites are the largest race/ethnic group in the sample). We further conducted analyses restricting the sample to mothers up to age 30 y, for whom rates of in vitro fertilization (IVF) use are extremely low (31), in case SES differences in the use of IVF treatments could distort results for DZ twins (DZ, but not MZ, twinning rates are much higher when IVF is used). In both cases, as well as in other analyses also reported in the SI Appendix, our substantive conclusion remains the same: no evidence of a Scarr-Rowe interaction.

Discussion Although past research has found stronger evidence for a Scarr-Rowe interaction in the United States than elsewhere, we failed to find evidence of increasing genetic influence on the cognitive similarity of twins as SES increases. Our analysis makes use of large-scale population-level administrative data in the United States where population inferences are not compromised by patterns of successful recruitment into a survey, especially among low-SES families. Our results suggest that the mixed results in this literature cannot be explained by lack of SES diversity in some samples or by differences between the United States and other nations. Trying to understand why we failed to find an interaction when (some) others have found an interaction is difficult. For example, Kovas et al. (32) reported higher grade-school heritabilities for literacy and numeracy versus IQ test scores, raising the possibility that our results could differ if we had IQ test results instead of math and reading achievement tests. However, the differences reported in that paper are present for younger but not older children, whereas our results are consistent across ages 8–14 y. Reviews of the literature have also cited other studies using achievement tests as evidence for the interaction (21). Perhaps our lack of direct zygosity measurement is the culprit, but the study that launched this literature also compared SS and OS twins (1), and our study’s sample size surmounts the concerns about statistical power raised regarding that study (33). Maybe the absence of private schools in our sample is an issue, because children of higher-SES families are more likely to attend private schools. However, decades of research have failed to establish clear evidence of a broadly positive causal effect of private compared with public schools in the United States, especially for children of more affluent families (34, 35). At a minimum, our findings indicate that the nature of the gene-environment interaction is less clear cut than may have been supposed from smaller samples. How to effectively describe the interplay of genes and environments remains a profound scientific challenge, and we hope continued improvements in the availability of both administrative and genomic data will yield important progress in the years ahead. One potential frontier for future work would make use of molecular genetic data rather than studies of twin similarity. Although early efforts to identify genetic correlates of cognitive ability faced substantial challenges (36), current efforts are quite promising (37⇓–39). As causal genetic variants for cognition become better established, one question will be whether they are more influential in some environments than others and whether there will be a more systematic genomic pattern of greater cumulative influence in higher-SES environments. However, it may still be some time before sufficient sample sizes will be available to investigate this question fully.

Materials and Methods Data for the study are based on confidential matched birth and school records obtained for all children who were born in Florida between 1994 and 2002 and subsequently educated in a Florida public school. Birth certificates were matched to the Education Data Warehouse by the Florida Department of Health and the Florida Department of Education using four variables: first and last name, date of birth, and social security number. To maximize correct matches, transposition of letters or numbers was allowed in up to two instances as long as the transposition did not match more than one record. Children were included if they (i) were born in Florida, (ii) remained in Florida until school age, and (iii) attended Florida public schools. This study was approved by the institutional review boards of Northwestern University, the University of Florida, and Florida’s education and health agencies and involved a thorough review of the Family Educational Rights and Privacy Act, as well as the Health Insurance Portability and Accountability Act regulations. Twins were ascertained by plurality information on the birth certificate, as well as the same date of birth and maternal characteristics. Nontwin siblings were ascertained by residential address in schooling and were linked back to the relevant students’ birth records to check that students we believed to be siblings were actually siblings (e.g., by comparing maternal characteristics such as date of birth). For families with more than two nontwin siblings in eligible cohorts, we constructed our sample of nontwin-sibling pairs by taking the two who were closest in age. Overall, 80.7% of all children born in Florida, and 79.5% of all twins born in Florida, were matched to school records. This aligns closely with the 80.9% rate of kindergarten-age children born in Florida and attending Florida public schools based on data from the American Community Survey. Florida has a larger population (20.3 million in 2015) than several nations in which the Scarr-Rowe interaction has been previously studied (e.g., The Netherlands, Sweden). Relative to other US states, Florida has higher than national average economic inequality, child poverty rates, and percentage of nonwhite residents, making it especially promising for assessing potential socioeconomic heterogeneity in causal effects. The sample’s diversity on key correlates of SES such as race, mother’s education, and mother’s age is shown in SI Appendix, Table S1. The table also shows that characteristics associated with disadvantage are slightly overrepresented in our estimation sample in comparison with all Florida births, which predominantly reflects the fact that families with adverse characteristics are more likely to enroll children in public school and less likely to migrate out of Florida. The measures of SES used in the paper are derived from birth records: the mother’s educational attainment, the median income of the zip code of the mother’s residence, and two PCA-based composite measures of multiple SES inputs. Maternal education is grouped into three categories: high school dropout (<12 y of education), high school graduate (12–15 y of education), and college graduate (≥16 y of education). Neighborhood income was determined by using the median income value based on the 2000 US Census for each zip code of parental primary residence as indicated in birth records. The primary composite measure includes these two measures along with family structure [mother and father married, parents unmarried but father listed on birth certificate, no father listed (reference category)] and whether the birth was paid for by Medicaid. Maternal education in years was used. Because zip code income is measured at different levels (zip code) than other SES inputs (individual) we also explored PCA analysis excluding median income. The primary academic achievement variables used were the reading and mathematics scores from the FCAT, which was administered annually throughout Florida public schools over our study period. Published correlations between tests designed to measure general cognitive ability (i.e., “IQ”, g) and standardized achievement test scores vary but are often ∼0.7 (40, 41), with lower correlations observed when short or limited-domain IQ tests are used or when academic achievement is measured from grade-point average or teacher assessment rather than testing. We standardized results for both FCAT tests by grade, year, and sex, to have a mean of zero and an SD of one in the overall population. This meant that the mean score in our linked sample would be above zero, because children of migrants into Florida have lower achievement than children born in Florida. We averaged scores across multiple grades to reduce measurement error. Stanford achievement tests were also administered to Florida students in school years 2000/2001–2006/2007, and we include results from these in SI Appendix, Figs. S4–S9, S12, S13, S20, S21, S24, and S25. Because of the restriction on the years when the test was administered, we were only able to perform the analysis for grades 3–5. We used two estimation strategies which are described in more detail in SI Appendix, Technical Appendix. First, we divided pairs into SES subgroups and we computed between-pair and within-pair variances for each group using mixed-effects linear regression estimated with maximum likelihood where within-individual across-grades errors are assumed to have an autoregressive structure of order one. Random effects are structured at the twin/sibling pair and individual levels. We also reported the ICCs, which are computed as the ratio of between-pair variation to the sum of within- and between-pair variations. Second, we used a model of continuous moderation, as described by Tucker-Drob and Bates (16). This model of phenotype variation starts with the A (additive genetic), C (shared environment), E (nonshared environment) parameters of the classic twin study model and adds a parameter M to separate the variance accounted for by the potential moderating variable (in our case, the SES measure). Then it adds interaction parameters A′, C′, E′, so that, for example, A′ = A × SES. Identifying these parameters requires assuming the average relatedness of SS twin pairs, which are a combination of MZ (r = 1) and DZ (r = 0.5) twins. In the main analysis, we used relatedness of 0.76 (grades 3–5) and 0.77 (grades 6–8) derived on the basis of our data but also conducted sensitivity estimates over a range of values between 0.700 and 0.800.

Acknowledgments We thank Timothy Bates, Elliot Tucker-Drob, and Eric Turkheimer for helpful discussions; Livia Baer-Bositis for editorial assistance; and Elliot Tucker-Drob for additional help with MPlus code. We thank the Florida Departments of Education and Health for providing the de-identified, matched data used in this analysis. D.N.F. and J.R. acknowledge support from the National Science Foundation and the Institute for Education Sciences (CALDER Grant), and D.N.F. acknowledges support from the National Institute of Child Health and Human Development and the Bill and Melinda Gates Foundation. The conclusions expressed in this paper are those of the authors and do not represent the positions of the Florida Departments of Education and Health or those of our funders.

Footnotes Author contributions: D.N.F., J.F., K.K., and J.R. designed research; D.N.F., J.F., and K.K. performed research; J.F. and K.K. analyzed data; and D.N.F., J.F., K.K., and J.R. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

See Commentary on page 13318.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1708491114/-/DCSupplemental.