The general framework of this study is outlined in Fig. 2. We investigated partner correlations (ρ y couple) in longevity (see Partner Correlations for Longevity). To dissect the source of these correlations and, in particular, to establish whether they arise due to indirect assortment, we followed several approaches. First, we considered correlations in longevity between parents of focal partners (ρ y ♀inlaws and ρ y ♂inlaws) (see Parental Correlations of Longevity). That is, ρ y ♂inlaws is the correlation between the two fathers of a husband and wife pair. Then, we considered to what extend potential targets of assortment, such as BMI or socio-economic status, which are correlated across generations, explained any observed parental correlations (see Effect of environmental factors on parental correlations in longevity). Finally, we evaluated correlations between genetic values (GBLUPs) of the focal partners (ρ g couple) to demonstrate assortment directly (see Partner correlations of genetic values of parental longevity).

Fig. 2 Schematic outline of the study. We consider couples and their parents. We compute phenotypic correlations between couples (ρ y couple) for longevity and disease status. Such correlations could be explained by the couple sharing a nuclear environment, e.g., shared exposures in the shared home or shared diet. To exclude the possibility of convergence based on shared nuclear environment, we examined parental correlations, i.e., correlations between the fathers (ρ y ♂inlaws) and mothers (ρ y ♀inlaws) of the partners. Such correlations cannot arise due to the nuclear couple environment, but require non-random mating and across-generation correlations. The across-generation correlations could arise due to heritable genetic effects or culturally transmitted environmental effects. We therefore also examined correlations in genetic values (ρ g couple), which provide evidence for non-random mating with respect to heritable factors Full size image

We hypothesized that indirect assortative mating for longevity could be driven by assortative mating for disease risk factors. We therefore also examined indirect assortment on disease risk, following the same approaches as for longevity (see Parental correlations in disease history).

The majority of analyses were performed using data from the UK Biobank cohort, but where possible results were replicated using the FamiLinx cohort (Kaplanis et al. 2018).

Couples in the UK Biobank cohort

Identification of heterosexual couples in the UK Biobank has been previously reported (Tenesa et al. 2015). Specifically, using household sharing information we identified a set of 105,380 households with exactly two members in the cohort. Of these, 90,297 satisfied all of the following criteria: (a) individuals reported different ages for one or both parents; (b) individuals had an age difference of < 10 years; (c) individuals were of opposite gender; (d) both individuals reported to live only with their partner or partner and children. We restricted our analysis to a subset of 79,094 couples for which both partners self-reported to be of White-British ethnicity.

Couples in the FamiLinx cohort

The FamiLinx cohort (Kaplanis et al. 2018), consisting of 86,124,644 individuals, is based on publicly accessible genealogy data ranging back up to the early fifteenth century and covering individuals born across the world, although individuals of European and North American birth dominate. In our analysis we restricted ourselves to a subset of individuals with full information regarding year of birth and death, latitude and longitude of the birth location. We removed individuals with a birth location along the zero meridian, as visual inspection suggested a majority of these to be coding errors. We furthermore removed individuals with lifespans below 30 years or above 130 years. Furthermore, following previous analysis (Kaplanis et al. 2018), we removed those individuals born before 1600 due to the sparsity and lower reliability of data before that date, and those individuals born after 1910 due to the bias towards individuals with reduced lifespan after that date. Finally, also following previous analysis (Kaplanis et al. 2018), we removed individuals who died during the American Civil War (year of death 1861–1865), the first World War (year of death 1914–1918) and the second World War (year of death 1939–1945) due to the excess number of early death in these periods. This resulted in a dataset of 3,445,971 individuals. Considering individuals with common offspring, we identified a set of 239,541 couples.

Definition of birth location

Both the UK Biobank and FamiLinx contain information about the birth locations of individuals, which we used to adjust for any potential geographical differences in longevity. However, in both cohorts the provided information is at a scale too fine to allow for effective stratification based on birth location. We therefore defined a birth location at a coarser scale in both cohorts.

The UK Biobank contains information about the coordinates of the birth location with a resolution of 1 km. We identified a subset of individuals with miscoded coordinates corresponding to birth in the Atlantic Ocean identified through visual inspection and set their birth location as missing. We used a 15 km grid to define birth location. That is, we assigned all individuals who share birth coordinates when divided by 15 km and rounded to an integer to the same birth location.

In the FamiLinx cohort, we defined a 1° latitude and longitude grid to derive birth location.

Genotypes and estimation of genetic values in UK Biobank

To perform genetic analyses we identified a set of quality-controlled, genotypically White-British individuals from the UK Biobank. Using appropriate subsets of these individuals as described for specific analyses, we jointly estimated SNP heritabilities and SNP effects following the mixed model approach using the DISSECT tool (Canela-Xandri et al. 2015). We used the estimated SNP effects to compute genetic values (i.e., GBLUPs). All models included the leading 20 genomic principal components as fixed effects.

The set of individuals available for genetic analyses was identified as follows. We used the data for the individuals genotyped in phase 1 of the UK Biobank genotyping programme. A total of 49,979 individuals were genotyped using the Affymetrix UK BiLEVE Axiom array and 102,750 individuals using the Affymetrix UK Biobank Axiom array. Details regarding genotyping procedure and genotype-calling protocols are provided elsewhere (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id = 155580). We performed quality control using the entire set of genotyped individuals before extracting the White-British cohort used in our analyses. From the overlapping genetic markers between the two arrays, we excluded those which were multi-allelic, their overall missingness rate exceeded 2% or which exhibited a strong platform specific missingness bias (Fisher’s exact test, pval < 10−100). We also excluded individuals if they exhibited excess heterozygosity, as identified by UK Biobank internal quality control (QC) procedures (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id = 155580), if their missingness rate exceeded 5% or if their self-reported sex did not match genetic sex estimated from X chromosome inbreeding coefficients. These criteria resulted in a reduced dataset of 151,532 individuals. To define the genotypically White-British subset, we performed a Principal Components Analysis of all individuals passing genotypic QC using a linkage disequilibrium pruned set of 99,101 autosomal markers (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id = 149744), which passed our SNP QC protocol. The genotypically White-British individuals were defined as those for whom the projections onto the leading 20 genomic principal components fell within three SDs of the mean and who self-reported their ethnicity as White-British. We furthermore pruned the set of genotypically White-British individuals removing one individual from pairs with relatedness above 0.0625 (corresponding to second degree cousins), to obtain a dataset of unrelated genotypically White-British individuals. Finally, in our genetic models we only used genetic variants that had passed QC, that did not exhibit departure from Hardy–Weinberg equilibrium (pval < 10−50) in the unrelated genotypically White-British cohort and which had a minor allele frequency > 5%.

Partner correlations for longevity

We estimated partner correlations of longevity, defined as the age in years at death using data from the two cohorts: the UK Biobank and Familinx. We also computed correlations of longevity adjusted for cohort effects. Specifically, we computed adjusted longevity as the difference between an individual’s lifespan and the mean lifespan of the stratum defined by the individual’s sex, birth year and birth location (see Definition of birth location), excluding all strata with fewer than 10 individuals.

As the majority of UK Biobank participants are alive, we used the biological mothers and fathers of participants. Specifically, we identified self-reported White-British individuals with both parents deceased (using data fields UKBID 21000, 1797 and 1835) and non-missing birth location (see Definition of birth location). This yielded 252,899 pairs of parents for which we computed Pearson’s correlations between longevity extracted from data fields UKBID 1807 and 3526. The UK Biobank does not directly contain information regarding the years or location of birth of parents of participants. As such, we used the participant’s place and year of birth (UKBID 34) as proxy measures of the parent’s place and year of birth. For a subset of parents, specifically parents who are still alive at recruitment of the participant, we can infer the parents’ year of birth from the date of recruitment and the parents’ age. The subset of parents who are still alive is relatively small, only 22% of fathers and 39% mothers, respectively, and is complementary to the set of parents used in the analysis, who were required to be deceased. Although we can therefore not use the data in our analysis, it allows us to evaluate the effect of using a proxy measure. The correlation between the year of birth of the offspring and their parent is relatively high with ρ = 0.78.

In the FamiLinx cohort, we used all 239,541 couples identified as described above (see Couples in the FamiLinx cohort). We computed longevity as the difference of year of death and year of birth.

Parental correlations of longevity

We computed Pearson’s correlations of longevity and adjusted longevity for parents of partners. That is, we computed, e.g., the correlation between the longevity of the two fathers of the male and female partners in a couple. We considered the three combinations of parents, i.e., the two fathers or the two mothers of the partners and the father of one partner and the mother of the other partner, separately. Both longevity and adjusted longevity were computed as for the analysis of partner correlations (see Partner correlations for longevity).

Of the 79,094 couples identified in the UK Biobank (see Couples in the UK Biobank) 40,504 had both mothers and 60,978 both fathers deceased, whereas there were 104,922 father–mother pairs. Among the 3,445,971 individuals retained for analysis in the FamiLinx cohort (see Couples in the FamiLinx cohort), we identified 97,223 sets of fathers, 66,077 sets of mothers and 143,896 father–mother pairs.

We computed expected distributions of parental correlations due to geographical and temporal mating structure in the population based on permutations. Specifically, we generated fictitious sets of couples, which matched the observed mating structure for birth years and birth locations, and computed the parental correlations in longevity for these fictitious couples. To generate the fictitious couples we stratified couples based on the birth year and birth locations of both partners and permuted male partners within each stratum. To allow for effective permutations we only included couples in strata of size larger than 10 in the analysis. For each permutation, we computed Pearson’s correlations of parental longevity as a test statistic. Empirical p-values were then computed as the fraction of statistics exceeding the statistic computed without permutation, based on 10,000 permutations.

Effect of environmental factors on parental correlations in longevity

We evaluated partner correlations for a range of potential assortment factors and evaluated their contribution to any observed correlations in parental longevity.

Specifically, we extracted Townsend Deprivation Index (UKBID 189), height (UKBID 50), waist-to-hip ratio (computed from UKBID 48 and 49), BMI (UKBID 21001) and smoking history in pack years (UKBID 20161) for all individuals in the 79,094 couples identified in the UK Biobank. The Townsend Deprivation Index is an area measure of socio-economical deprivation. We computed Pearson’s correlations between the male and female partners for all pairs of these variables as well as birth year.

We then computed linear regression models, regressing parental longevity on birth year, birth location, as well as Townsend Deprivation Index and height, waist-to-hip ratio, BMI and smoking history in pack years, and the squares of these factors, of their children. Birth year and birth location were coded as categorical variables, whereas all other factors and their squares were included as continuous variables. Using the fitted models, we computed residuals and correlations between couples using these residuals. Comparing these, we quantified the change in correlations due to inclusion of individual covariates in the models.

Partner correlations of genetic values of parental longevity

As the majority of individuals in the UK Biobank are still alive, we cannot estimate genetic values for longevity directly. We therefore again use information about the lifespans of parents of participants and estimate genetic values (GBLUPs) for parental longevity as a proxy for genetic values of an individual's longevity.

Of the UK Biobank individuals retained for genetic analysis (see Genotypes and estimation of genetic values in UK Biobank), subsets of 79,216 and 64,002 had respectively deceased fathers and mothers. Using these individuals, we estimated SNP heritabilities and genetic variant effects for parental longevity based on common variants, i.e., variants with minor allele frequency above 5%. Of the 79,094 couples identified in the UK Biobank (see Couples in the UK Biobank cohort) a subset of 10,160 couples consisted of individuals retained for genetic analysis. For these couples, using the estimated genetic variant effects, we computed genetic values (Canela-Xandri et al. 2015, 2016) for parental longevity and computed their Pearson’s correlation.

Disease history in the UK biobank

Participants in the UK Biobank provide information about the family history for 12 diseases for both biological parents (UKBID 20107 and 20110). Considering the 79,094 couples identified in the UK Biobank (see Couples in the UK Biobank cohort), disease history for both biological parents of each partner was reported by 58,043 couples for heart disease, stroke, chronic bronchitis, high blood pressure, diabetes and Alzheimer’s disease, and by 57,644 couples in the case of lung cancer, bowel cancer, Parkinson’s disease and depression. For the latter subset, information regarding disease history for the relevant parent for breast and prostate cancer was available for each partner.

The twelve disease for which family history was provided do not directly match disease reported in the self-reported medical history of participants (UKBID 20002). To identify self-reported controls, the methodology of Muñoz et al. (2016) was utilized, to match diseases to those reported for family history.

Parental correlations in disease history

Following the methods for parental correlations for longevity (see Parental correlations of longevity), we computed correlations of disease history between the fathers and mothers of couples in the UK Biobank. We also computed correlations for each disease using only couples where both partners are self-reported controls for the relevant disease.

As disease history or status for an individual is a binary trait, Pearson’s correlations are not a suitable measure of correlation. Instead we computed polychoric correlations (Drasgow 1986) using the R package polycor (Fox 2010). In addition, we assessed dependence between partner’s family histories using a x2 test and by computing empirical mutual information (Cover and Thomas 2012). For mutual information we computed an empirical p-value for departure from independence using permutations. That is, we computed empirical mutual information for 1000 datasets in which family history for the male partners had been permuted and compared them with the empirical mutual information on the observed data.

As for longevity, we evaluated the expected effect of assortment due to place and year of birth using permutations. Permutations were performed as for longevity, using the x2 statistics, rather than Pearson’s correlation, as test statistic.

We performed an additional permutation analysis to assess the impact of using the offspring’s year of birth as a proxy for the parents’ year of birth. Unlike in the analysis of longevity, where all parents are deceased, a subset of parents with family history is still alive. For these parents we can compute the year of birth. On the subset of parents with available year of birth, we permuted UK Biobank couples within the years of birth of their parents. That is, the offspring within the years of birth of the parents. We did not permute within both birth year and birth location strata due to the smaller sample size.

Partner correlations of genetic values of disease history

We computed correlations for genetic values of parental disease history and self-reported disease status. For own disease status, we restricted the analysis to diseases with prevalence in the sample above 5% and excluding prostate and breast cancers.

For family disease history traits, we fitted models with only genomic principal components and models that also included the participant’s birth year and birth location as categorical and the parents’ age as continuous covariates. The parent’s age was computed as either the age at death (UKBID 1807 and 3526), if the parent was deceased, or age at assessment (UKBID 2946 and 1845) otherwise. Models used to estimate genetic values for self-reported disease also included the participant’s sex, age and Townsend Deprivation Index as fixed effects.

We fitted models using all individuals available for genetic analysis (see Genotypes and estimation of genetic values in UK Biobank), who reported family history. We transformed heritabilities that were estimated on the observed scale, i.e., modelling disease status directly, to the liability scale using the sample-specific prevalence (Lee et al. 2011). Using SNP effects estimated on all individuals, we computed genetic values for the 10,160 couples that comprised individuals retained for genetic analysis (see Genotypes and estimation of genetic values in UK Biobank) and computed their Pearson’s correlations. We combined paternal and maternal estimates using the Olkin-Pratt fixed effect approach (Schulze 2004).