Significance Using administrative register data with information on family relationships and cognitive ability for three decades of Norwegian male birth cohorts, we show that the increase, turning point, and decline of the Flynn effect can be recovered from within-family variation in intelligence scores. This establishes that the large changes in average cohort intelligence reflect environmental factors and not changing composition of parents, which in turn rules out several prominent hypotheses for retrograde Flynn effects.

Abstract Population intelligence quotients increased throughout the 20th century—a phenomenon known as the Flynn effect—although recent years have seen a slowdown or reversal of this trend in several countries. To distinguish between the large set of proposed explanations, we categorize hypothesized causal factors by whether they accommodate the existence of within-family Flynn effects. Using administrative register data and cognitive ability scores from military conscription data covering three decades of Norwegian birth cohorts (1962–1991), we show that the observed Flynn effect, its turning point, and subsequent decline can all be fully recovered from within-family variation. The analysis controls for all factors shared by siblings and finds no evidence for prominent causal hypotheses of the decline implicating genes and environmental factors that vary between, but not within, families.

The Flynn effect refers to a secular increase in population intelligence quotient (IQ) observed throughout the 20th century (1⇓⇓–4). The changes were rapid, with measured intelligence typically increasing around three IQ points per decade. The increase seemingly contradicted the earlier hypothesis that IQs were declining due to an inverse correlation between IQ and fertility—so-called dysgenic fertility (5). In recent years, the Flynn effect has weakened and reversed in several Western countries (6), leading to speculation that the Flynn effect was a transient phenomenon reflecting a boost in IQ from environmental factors that temporarily masked an underlying dysgenic trend (2, 6).

Several causal hypotheses have been set forth to explain trends in measured intelligence across birth cohorts (2, 7). Birth cohort differences in intelligence will reflect differences in either average genotype or environmental exposure, and the hypotheses propose different causal factors that have shifted over time in ways that could plausibly generate the observed variation in IQ scores.

To narrow down the set of hypotheses, we examine the extent to which we can recover observed Flynn effects from within-family variation in large-scale administrative register data covering 30 birth cohorts of Norwegian males. Within-family variation will only recover the full Flynn effect if the underlying causal factors operate within families. Notably, if within-family variation fully recovers both the timing and magnitudes of the increase and decline of cohort ability scores in the data, this effectively disproves hypotheses requiring shifts in the composition of families having children. This set of disproved hypotheses would include dysgenic fertility and compositional change from immigration, the two main explanations proposed for recent negative Flynn effects (6, 7).

In Table 1, we categorize the main hypotheses according to whether or not they allow for within-family Flynn effects. A metareview of empirical studies argues that the positive Flynn effect relates to improved education and nutrition, combined with reduced pathogen stress (2). Turning to the negative Flynn effect, the metareview notes a deceleration of IQ gains in some studies and suggests that these may relate to (i) decreasing returns to environmental inputs (“saturation”) or (ii) the “picking up of effects that cause IQ decreases and may ultimately reverse the Flynn effect,” such as dysgenic fertility (2). Dysgenic fertility is also the favored hypothesis in a recent literature review on reversed Flynn effects, where the authors conclude that dysgenic trends are the “simplest explanation for the negative Flynn effect” (6). A negative intelligence–fertility gradient is hypothesized to have been disguised by a positive environmental Flynn effect, revealing itself in data only “once the ceiling of the Flynn effect was reached.” The review further suggests that this direct genetic effect may be amplified by a social multiplier. Additional hypotheses for both the positive and negative Flynn effects are drawn from a survey of intelligence researchers (7), a subsample of whom claimed specific expertise on the Flynn effect. These researchers largely agreed with the metareview on the environmental factors driving the positive Flynn effect. The researchers were also asked about retrograde effects, with the question “In your opinion, if there is an end or retrograde of the Flynn-effect in industrial nations, what are the most plausible scientific theories to explain this development?” Here, the highest scores were assigned to dysgenic fertility, immigration, and reduced education standards.

Table 1. Overview of hypothesized causes for positive and negative Flynn effects

Past research suggests that within-family Flynn trends exist and correlate with observed patterns (1, 8). The IQ difference between scored siblings was shown to shrink with the age difference in periods of rising cohort IQs (as the Flynn effect counteracts the first-born birth order advantage) and to increase with the age difference in a period with declining cohort IQs (8). Building on this result, we use population-covering administrative data registers from Norway to estimate within-family Flynn effects across 30 birth cohorts and examine whether these estimates recover the full magnitude of, variation in, and reversal of the Flynn effects seen in average cohort scores. The Norwegian data have been extensively used in intelligence research (1, 4, 9⇓–11) and provide a particularly useful dataset for our purposes given the roughly symmetric positive and negative trends across the 1962–1991 cohorts (Fig. 1A). Based on data from birth cohorts born before 1985, prior research has reported this as a slowdown or leveling off of the Norwegian Flynn effect (9), but the additional cohorts included in our data strongly indicate that it is in fact a reversal.

Fig. 1. Average IQ score by birth year (A) and distribution of IQ scores (B). IQ scores are computed from stanine scores (s) using the conversion IQ = 100 + 7.5 × (s − 5). In A, the shaded region depicts 95% confidence intervals around the cohort mean score. n = 736,808.

The analysis is made possible by the comprehensive coverage of administrative data for the native-born population. This enables us to precisely identify family relationships, birth order, and siblings without ability scores from military conscription testing. Precise controls of birth order are necessary for estimation of within-family trends, as prior research shows that IQ relates inversely to sibling order (12⇓–14). Ignoring birth order would induce omitted-variable bias with the order effect falsely attributed to later birth years, in turn causing negative bias in trend estimates. Information on unscored individuals is required to correct for changes in selection into ability testing over time, which otherwise will bias trend estimates.

Results The research question is whether within-family variation can recover the population Flynn trend apparent across families. This requires an appropriate comparison curve showing the across-family variation in IQ scores. The simplest such curve is the curve of observed means for all firstborn children by birth year, as it obviates the need for statistical controls for birth order. The coefficients from the standard fixed-effects model estimated on data for all scored siblings closely track the across-family variation in IQ scores throughout most of the data period (Fig. 2A). In particular, the within-family estimates confirm the positive Flynn trend during the first half of the observation period, with positive and statistically significant Flynn effects for the birth years 1962–1975. For this period, the average within-family Flynn effect is 0.26 IQ point per year (SI Appendix, Table S3, column 3), similar to the 0.28 estimated annual gain for full-scale IQ from a metaanalysis based on 271 independent samples from 31 countries (2). Fig. 2. Within-family estimates of Flynn effects. The sample underlying estimates in A consists of all families with at least two scored brothers (n = 355,438), B of families with scored brothers in the first two parities (n = 215,514), and C of siblings born 1962–1991 in all families with sons in the first two parities (n = 236,934). Two-brother samples exclude twins and brothers born the same year. The dashed line depicts the trend for firstborn sons (n = 320,739 in A and B and 353,476 in C). Confidence intervals are computed from SEs clustered within families. The shaded region in C covers percentile values from the posterior distribution of the Bayesian model. The fixed-effects model correctly identifies the turning point of the Flynn effect and indicates a decline for post-1975 birth cohorts. For the cohorts born in the latter half of the 1980s, however, the across-family decline in cohort IQ exceeds what the within-family estimates recover. Between the 1975 and 1991 cohorts, the average annual decline estimated using within-family variation is attenuated by almost two-thirds relative to the across-family trend: −0.08 IQ point per year versus −0.23 point per year (SI Appendix, Table S3, columns 1 and 3). One source for this divergence of within-family estimates and across-family Flynn trends in the decline period may be sample selection bias induced by conditioning the data on siblings with a valid IQ score. If selection into scoring is increasing over time, this generates a positive bias in trend estimates as families showing a decline are disproportionately removed from the sample. Conscription test coverage declined substantially for cohorts born after 1980, with coverage rates falling from 93% in 1980 to 83% in 1991 (Fig. 3A). This decline in coverage was selective and partly based on characteristics associated with intelligence: Focusing on families with sons in the first two parities and plotting the share of unscored younger siblings by the observed IQ score of the older brother, lower scoring firstborns were more likely to have unscored younger brothers (Fig. 3B). The problem is exacerbated toward the end of our data window: Among the 1987–1991 birth cohorts, fully 30% of those whose older sibling scored in the bottom IQ bracket have missing IQ scores. As sibling scores are correlated, this implies that low-ability males are less likely to be scored, and that the selection was stronger for the cohorts born in the late 1980s than for those from the 1960s and 1970s. Fig. 3. IQ score coverage in all families and missing IQ data in two-brother sample. A shows data coverage for all boys present in Norway on their 18th birthday (n = 817,611). B shows noncoverage rates for younger brothers in the two-brother sample; for legibility the figure depicts rates for three 5-y intervals only (n = 65,363; see SI Appendix, Table S4 for the complete series). To assess the impact of this selection issue, we developed a Bayesian model for sibling pairs that exploits the correlation in sibling ability to estimate and correct for selection into scoring (Materials and Methods). The model provides selection-corrected estimates for both the within-family and population across-family Flynn trends, estimated on data for all pairs of male siblings (scored or unscored) born in different years and in the first two parities of their family (“two-brother sample”). Estimating the standard fixed-effect model on observed scores from the two-brother sample reconfirms that the fixed-effects model is unable to recover the across-family decline in cohort IQ (Fig. 2B). By contrast, the selection-correction model infers a stronger and more persistent within-family decline continuing into the years with increasing scoring selection that largely coincides with the selection-corrected across-family trend (Fig. 2C). The correlation in sibling ability, central to this model, is estimated at 0.47 (95% uncertainty bound: 0.46–0.48; see SI Appendix and SI Appendix, Table S6), identical to the weighted average of 69 studies based on a total of 26,473 American sibling pairs (15). Using the parameter draws from the posterior distribution, we can assess the similarity of the within- and across-family trends over longer periods. For the 1962–1975 Flynn increase period, the model estimates a 0.20 (95% uncertainty bound: 0.11, 0.29) average annual IQ point increase within families and a 0.18 (0.14, 0.21) increase across families (SI Appendix and SI Appendix, Table S3, columns 6 and 7). For the 1975–1991 decrease period, we estimate a 0.33 (0.26, 0.40) annual IQ point decline within families and a 0.34 (0.30, 0.38) decline across families. Taking the ratio of the within-family and across-family estimates, we find ratios of 1.14 (0.63, 1.69) for the increase period and 0.98 (0.79, 1.20) for the decrease period (Table 2, column 3). Table 2. Estimated within-family Flynn effect relative to across-family trend during increase (1962–1975) and decrease (1975–1991) periods

Discussion Viewed together, the results from the standard family fixed-effects model and selection-correction model show that observed Flynn effects—both positive and negative—across three decades of Norwegian birth cohorts can be recovered using only within-family variation in IQ scores. While the fixed-effects model using observed scores fails to recover the full decline in the post-1975 cohorts in both the full sample (Fig. 2A) and the two-brother sample (Fig. 2B), the Bayesian model addressing selection into scoring fully recovers both the increase, turning point, and decline apparent across families over time, while indicating that the retrograde Flynn effect is more negative than that seen in observed scores (Fig. 2C). The results show that large positive and negative trends in cohort IQ operate within as well as across families. This implies that the trends are not due to a changing composition of families, and that there is at most a minor role for explanations involving genes (e.g., immigration and dysgenic fertility) and environmental factors largely fixed within families (e.g., parental education, socialization effects of low-ability parents, and family size). While such factors may be present, their influence is negligible compared with other environmental factors. Notably, this goes counter to the conclusion of a recent review on retrograde Flynn effects (6) and the expert opinions reported in a recent survey of intelligence researchers, which found “the anti-Flynn effect being attributed mainly to genetics and immigration” (7). As noted by two of the reviewers, the magnitude of the negative Flynn trend in our data itself speaks against the dysgenic hypothesis for retrograde Flynn effects, as changes in IQ over time are too large to plausibly reflect selection-driven genetic change in the population. This, in turn, means that dysgenic trends may be statistically imperceptible over the 16-y decline period studied. Polygenic scores that predict education are correlated with IQ and have been shown to correlate negatively with fertility in Icelandic and US data (16, 17). The authors of the Icelandic study extrapolate that their results imply a decline of 0.30 IQ point per decade, an effect sufficiently small to fall within the uncertainty bounds of the difference between across- and within-family trend estimates in the present study. While we cannot statistically rule out dysgenic trends of this magnitude, a more direct assessment of reproductive selection across the IQ distribution finds no indication of dysgenic fertility. The vast majority of fathers to children in the post-1975 cohorts were born between 1950 and 1970, and for these males we see a slight, positive IQ–fertility gradient: The mean IQ when scores are weighted by an individual’s number of children exceeds the unweighted mean (SI Appendix and SI Appendix, Table S5). This was the case both for the 1950–1960 cohorts scored under the old test norm and for the 1962–1970 cohorts scored under the new norm. A recent study finds similar results for these cohorts in data from neighboring Sweden (18). Ability scores are not available for women, but when we examine years of schooling instead of IQ scores we find the same pattern for men and no indication of negative (nor substantial positive) selection for women. Using the ratio of child-weighted to unweighted means as a summary indicator, the ratio is one or higher for each of the gender-cohort-specific comparisons (SI Appendix, Table S5). The ratios based on years of schooling are also remarkably stable across time for both men and women despite the dramatic increase in educational attainment that occurred across these cohorts. These results come with caveats, however: They speak only to dysgenic effects occurring within our sample of children born to two native-born parents, and the results assess the ability–fertility gradient using phenotypic (expressed) traits. On this last point, we cannot rule out the theoretical possibility of negative selection on a genetic component that is masked when assessed using environmentally influenced measures. Turning to the remaining hypotheses proposed, we note the difficulty of disentangling cohort and period effects. While our results support the claim that the main drivers of Flynn effects are environmental and vary within families, we are unable to identify the causal structure of the underlying environmental effects: Exposure occurring in any year will affect all cohorts below conscription age, but sensitivity to environmental factors may differ by age, and environmental effects may decay at different rates after exposure. The study design cannot distinguish between such possibilities, which also implies that the Flynn effect between two cohorts may differ with the age at which they are assessed (see the discussion in ref. 19), and our results remain consistent with a number of proposed hypotheses of IQ decline: changes in educational exposure or quality, changing media exposure, worsening nutrition or health, and social spillovers from increased immigration.

Materials and Methods Data. The data cover the full birth cohorts from 1962 through 1991 and include a cognitive ability stanine score from military conscription testing at age 18–19 y for the vast majority of Norwegian-born males. We use a pseudonymous personal identifier to link records across administrative data registers and identify family relationships and siblings born to the same mother and father. To account for family background and family structure, we restrict the analyses to native-born individuals with two native-born parents. Cohorts born before 1962 were subject to a different scoring norm, and cohorts born later than 1991 faced a radically different conscription process with less than 50% invited for in-person testing after completing a web-administered survey. As a result, representative data are not available for later birth cohorts. Data for immigrants are excluded as information on full family size and exact birth order is of lesser quality, while selection into scoring is markedly different as immigrants typically do not face mandatory conscription testing but need to self-select into conscription. Finally, we restrict the analyses to those present in Norway on their 18th birthday, leaving us with an overall sample of 817,611 observations, of which 736,808 (90.1%) have a valid ability score; see SI Appendix, Table S1. Following convention, we calculate the IQ score from the aggregate stanine score given each conscript based on three speeded tests of arithmetic (30 items), word similarities (54 items), and figures (36 items). The average IQ score from these tests rose from 99.5 for the 1962 birth cohort to 102.3 for the 1975 cohort, after which it declined to 99.4 for the 1989 cohort (then rising slightly to 99.7 for the 1991 cohort; Fig. 1A). Apart from the mathematics test changing to multiple-choice format in the beginning of the 1990s, both the test and the scoring norm were constant throughout the period. Fig. 1B confirms that the IQ scores in our data follow the expected bell-shaped distribution. Statistical Methods. Using a family-fixed-effects specification, we use data on scored brothers to estimate the model I Q i f = ∑ b = 1962 1991 τ b B b i + ∑ n = 2 18 θ n N n i + α f + ε i f , where the dependent variable is the IQ score of individual i of family f, B b and N n denote indicator variables for birth year and birth order (the maximum birth order in our data are 18), and α f is the family fixed effect. The fixed-effects estimator controls for all factors shared by siblings, the birth-order variables capture any deterioration due to within-parent factors like aging parents or favoring the firstborn, and we identify Flynn effects from the remaining variation in IQ and birth year between siblings. As the positive IQ trend in our data ended with the 1975 cohort, we omit 1975 from the set of birth year indicators such that the estimates of τ b give the difference from the IQ score of the 1975 birth cohort. Coefficients corrected for conditioning bias were estimated using a Bayesian maximum likelihood model and data on male sibling pairs from all families where the first two children are male and born in different years. The model assumes that there are two reasons for systematic differences between sibling scores: a birth-order effect and within-family Flynn effects. Both of these are allowed to shift gradually over time. Adjusted for these systematic differences, sibling abilities are assumed to follow a bivariate normal distribution with a fixed covariance. This allows us to parametrically express the bivariate distribution across stanine score bins for any combination of sibling birth years. Some parts of this distribution will be underrepresented in the distribution of fully scored brother pairs, and a birth-year-specific scoring probability vector is identified that best allocates the partially and fully nonscored sibling pairs across this distribution. The population across-family trend for all firstborns is found by combining the cohort means for firstborns present and missing from the two-brother sample after correcting both for scoring selection. The within-family Flynn effect is modeled as a random walk. The changing birth order is modeled as a Gaussian process with a squared exponential covariance function, stabilizing the estimates by imposing local smoothing. The Bayesian model was implemented in the Stan programming language for probabilistic models (20, 21) and estimated using Monte Carlo Markov Chains and a No U-Turn Sampler. Further details, results, and model code are available in SI Appendix. Ethics. The project used data in accordance with ethical and legal requirements. This involved approval by the Frisch Centre’s Data Protection Officer, along with formal concessions from the Norwegian Data Protection Authority and owners of the register data used. The data were made available on loan by Statistics Norway for research purposes. They are available from Statistics Norway for other researchers on the same terms provided that the researchers satisfy the required formal criteria.

Acknowledgments We thank Anders Martin Fjell, James R. Flynn, Andreas Kotsadam, Anja Myrann, Oddbjørn Raaum, Jon Martin Sundet, and three anonymous referees for constructive comments. This work was supported by the Norwegian Research Council Grant 236992. Data on ability scores have been obtained by consent from the Norwegian Armed Forces, who are not responsible for any of the findings and conclusions reported in the paper. Data made available by Statistics Norway have been essential for this research.

Footnotes Author contributions: B.B. and O.R. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1718793115/-/DCSupplemental.