UK Biobank

UK Biobank is a large-scale cohort study, including 502,655 participants aged between 40–69 years. Study participants were recruited from 22 recruitment centres across the United Kingdom between 2006 and 201053,54. For the purposes of our analyses, we restricted the dataset to a subset of 463,827 individuals of recent European descent with available genotype data, with individuals of non-European descent removed based on a k-means cluster analysis on the first four genetic principal components55. The different subsets of UK Biobank utilised in our analyses are illustrated in Supplementary Fig. 1.

The UK Biobank was approved by the North West Multi-centre Research Ethics Committee. All UK Biobank study participants gave informed consent. This research project was approved under an amendment to application 15825 and complied with all relevant ethical regulations.

Spouse-pair subsample

Spouse information is not explicitly available, therefore we used similar methods to previous studies15,16,17 to identify spouse-pairs in the UK Biobank. Starting with the European subsample described above, household sharing information was used to extract pairs of individuals who (a) report living with their spouse (6141-0.0), (b) report the same length of time living in the house (699-0.0), (c) report the same number of occupants in the household (709-0.0), (d) report the same number of vehicles (728-0.0), (e) report the same accommodation type and rental status (670-0.0, 680-0.0), (f) have identical home coordinates (rounded to the nearest km) (20074-0.0, 20075-0.0), (g) are registered with the same UK Biobank recruitment centre (54-0.0) and (h) both have available genotype data. If more than two individuals shared identical information across all variables, these individuals were excluded from analysis. At this stage, we identified 52,471 potential spouse-pairs.

We excluded 4866 potential couples who were the same sex (9.3% of the sample), as unconfirmed same sex pairs may be more likely to be false positives. Although sexual orientation data were collected in UK Biobank, access is restricted for privacy/ethical reasons. To reduce the possibility that identified spouse-pairs are in fact related or non-related familial, non-spouse pairs; we removed three pairs reporting the same age of death for both parents (1807-0.0, 3526-0.0). Then we constructed a genetic relationship matrix (GRM) amongst derived pairs and removed 53 pairs with estimated relatedness (IBD > 0.1). To construct the GRM; we used a pool of 78,341 markers, which were derived by LD pruning (50KB, steps of 5 KB, r2 < 0.1) 1,440,616 SNPs from the HapMap3 reference panel56 using the 1000 Genomes CEU genotype data57 as a reference panel. The final sample included 47,549 spouse-pairs.

Non-spouse-pair samples

For secondary analyses requiring data from unrelated individuals, we derived a sample of individuals of European descent and a more restrictive sample believed to be of white British descent. Starting with the UK Biobank subset of 463,827 individuals of recent European descent, we removed 78,540 related individuals (relevant methodology has been described previously55) to generate the European sample and using lists provided by UK Biobank, further restricted this sample to 337,114 individuals identifying as being of white British descent.

Height and educational attainment

At baseline, the height (cm) of UK Biobank participants was measured using a Seca 202 device at the assessment centre (ID: 50-0.0). Measured height was used as a positive control for the application of a Mendelian randomization framework in the context of assortative mating.

Educational attainment as characterised by years in full-time education was defined as in a previous publication58. Individuals born outside England, Scotland or Wales were removed because of schooling system differences, participants with a college or university degree were classified with a leaving age of 21 years and participants who self-reported leaving school when younger than 15 years were classified with a leaving age of 15. Educational attainment was included as a covariate in phenotypic analyses of spousal alcohol behaviour similarities as a possible confounder.

Self-reported alcohol variables

At baseline, study participants completed a questionnaire. Participants were asked to describe their current drinking status (never, previous, current, prefer not to say) (ID: 20117-0.0) and estimate their current alcohol intake frequency (daily or almost daily, three or four times a week, once or twice a week, one to three times a month, special occasions only, never, prefer not to say) (ID: 1558-0.0). Individuals reporting a current intake frequency of at least once or twice a week were asked to estimate their average weekly intake of a range of different alcoholic beverages (red wine, white wine, champagne, beer, cider, spirits, fortified wine) (ID: 1568-0.0, 1578-0.0, 1588-0.0, 1598-0.0, 1608-0.0).

From these variables, we derived three measures: ever or never consumed alcohol (current or former against never), a binary measure of current drinking for self-reported current drinkers (three or more times a week against less than three times a week) and an average intake of alcoholic units per week, derived by combining the self-reported estimated intakes of the different alcoholic beverages consumptions across the five drink types, as in a previous study21. The questionnaire used the following measurement units for each of the five alcoholic drink types: measures for spirits, glasses for wines and pints for beer/cider, which were estimated to be equivalent to 1, 2 and 2.5 units, respectively. Individuals reporting current intake frequency of “one to three times a month”, “special occasions only” or “never” (for whom this phenotype was not collected), were assumed to have a weekly alcohol consumption volume of 0. More information on alcohol variables used in this study is contained in Supplementary Table 6.

Genotyping

488,377 UK Biobank study participants were assayed using two similar genotyping arrays, the UK BiLEVE Axiom™ Array by Affymetrix1 (N = 49,950) and the closely-related UK Biobank Axiom™ Array (N = 438,427). Directly genotyped variants were pre-phased using SHAPEIT359 and then imputed using Impute4 using the UK10K60, Haplotype Reference Consortium61 and 1000 Genomes Phase 357 reference panels. Post-imputation, data were available on ~96 million genetic variants.

Utilising genetic data to disentangle spousal correlations

In general, the effects of genetic variation on a phenotype can be assumed to be via the variant’s effect on intermediary observable or unobservable phenotypes. In the context of assortative mating, it is unlikely that individuals would assort based directly on genotype but rather on an observed phenotype influenced by genetic factors. Assuming that a phenotype is influenced by genetic factors G and individuals assort on the phenotype such that the phenotypic correlation between spouses is equal to C, then expected correlations between an index individual’s G and their partner’s phenotype and G induced by assortment can be shown to be a function of the heritability of the phenotype and the spousal phenotypic correlation C (Supplementary Methods). This implies that estimates of assortative mating utilising genetic data are likely to be attenuated compared to the true value of phenotypic assortment, unless genetic factors completely explain variation in the phenotype of interest or the estimates are rescaled as in Mendelian randomization.

However, there are notable advantages of applying genetic approaches such as Mendelian randomization and genetic correlation analyses to the context of assortative mating for mechanistic understanding. In conventional Mendelian randomization studies33,34, genetic variants are used as proxies for a measured exposure to evaluate potential causal relationships between an exposure and an outcome (e.g. LDL cholesterol and coronary heart disease38). Genetic proxies may be more reliable than the measured exposure because of the reduced potential for confounding and reverse causation.

In the context of Mendelian randomization across spouses, the premise is largely similar; the exposure is an individual’s phenotype (e.g. alcohol consumption), proxied by a genetic instrument, and the outcome is their partner’s phenotype (e.g. alcohol consumption). A Mendelian randomization approach can evaluate a direct effect of an individual’s alcohol consumption on the alcohol consumption of their partner as opposed to effects of social homogamy. A direct effect captured by a Mendelian randomization framework could capture; individuals being likely to select a mate with similar behaviour (assortative mating), an individual’s alcohol consumption influencing their partner’s during the relationship (partner interaction effects) or more similar couples staying together for longer (relationship dissolution). Note that as genotype is fixed from birth, a Mendelian randomization estimate will not capture an effect of the partner’s alcohol consumption on the index individual during the relationship. Interpretation can be nuanced, as for example, it seems unlikely that an individual’s height could influence the height of their partner, but partner interaction effects are highly plausible for alcohol behaviour.

Similarly, estimating the genotypic concordance between spouses for variants relating to a trait of interest can be used to improve mechanistic understanding. The interpretation of genotypic concordance is comparable to that of Mendelian randomization across spouses with two important distinctions. First, genotypic concordance will not capture partner interaction effects as germline DNA is fixed for both spouses prior to assortment. Second, concordance induced by assortment will be further attenuated compared to a Mendelian randomization approach.

Spousal phenotypic spousal concordance for height

To verify the validity of the derived spouse-pair sample, we evaluated the spousal phenotypic concordance for height. Previous studies have found strong evidence of spousal concordance for height, so comparable results would be consistent with derived spouses being genuine. The spousal phenotypic concordance was estimated using a linear regression of an individual’s height against the height of their partner, adjusting for sex. With one unique phenotype pairing within couples (male spouse height/ female spouse height), each individual in the dataset was included only once as either the reference individual or their partner.

Effect of height on height of partner

We validated the application of a Mendelian randomization approach to assortative mating using height as a positive control; genotypes influencing height have previously demonstrated to be highly correlated between spouse-pairs15. As a measure of genetically influenced height, we started with 382 independent SNPs, generated using LD clumping (r2 < 0.001) in MR-Base62, from a recent Genome-wide Association Study (GWAS) of adult height in Europeans63.

For the purposes of the Mendelian randomization analysis, we restricted analyses to spouse-pairs with complete measured height data and genotype data. First, we estimated the association between 378 SNPs (four SNPs were unavailable in the QC version of the dataset) and height in the same individual, using the spouse-pair sample with sex included as a covariate. Second, we estimated the association between the 378 SNPs and spousal height. PLINK64 was used to estimate the SNP-phenotype associations also including sex as a covariate. We then estimated the effect of a 1 cm increase in an individual’s height on their partner’s height using the TwoSampleMR R package62 and the internally derived weights described above. The fixed-effects Inverse-Variance Weighted (IVW) method was used as the primary analysis. Cochran’s Q test and the I2 statistic were used to test for heterogeneity in the fixed-effects IVW65. MR Egger47 was used to test for directional pleiotropy. The weighted median48 and mode49 were used to test the consistency of the effect estimate. With two unique pairings between genotype and phenotype in each couple (male spouse genotype/ female spouse height and the converse), each individual in the dataset was included twice as both the reference individual and as the partner.

Spousal genetic concordance for height

To evaluate spousal genotypic concordance for height, we evaluated the association between height polygenic scores (PGS) across spouse-pairs. Height PGS were constructed in PLINK64 using the 378 height SNPs discussed above. The cross-spouse association was estimated using linear regression of an individual’s PGS against the PGS of their partner. With one unique genotype pairing within couples (male spouse genotype/female spouse genotype), each individual in the dataset was included only once as either the reference individual or their partner.

Phenotypic spousal concordance for self-reported alcohol use

To evaluate the phenotypic concordance on alcohol use we compared self-reported alcohol behaviour between spouses. We estimated the spousal concordance for the two binary measures (ever or never consumed alcohol, three or more times a week) using a logistic regression of the relevant variable for an individual against the relevant variable for their partner, adjusting for sex, age and partner’s age. In addition, we included recruitment centre, height and education (of both spouses) in the model as potential confounders. Similarly, linear regression was used to estimate the spousal-concordance for continuous weekly alcohol consumption volume, adjusting for the same covariates. Spouse-pairs with any missing phenotype data, or where one or more spouses reported their weekly alcohol consumption volume to be more than five standard deviations away from the mean (calculated using the sample of individuals with non-zero weekly drinking) were removed from relevant analyses. With one unique phenotype pairing within couples (male alcohol variable/ female alcohol variable), each individual in the dataset was included only once as either the reference individual or their partner.

Effect of alcohol use on partner’s alcohol use

We then applied the Mendelian randomization framework to investigate if an individual’s genotype at rs1229984 in ADH1B affects the self-reported alcohol consumption volume of their partner. Given the rarity of individuals homozygous for the minor allele in European populations, the MAF is 2.9% in the 1000 Genomes CEU population57, we first determined whether an additive or a dominant model (as used in previous studies38,66) was most appropriate for the SNP by comparing the association of genotype at rs1229984 with self-reported weekly alcohol consumption in the European and British samples. We found strong evidence to suggest that the SNP has an additive effect on alcohol consumption (Supplementary Table 7) and assumed this model in all relevant analyses.

For the Mendelian randomization analysis, we restricted analysis to spouse-pairs where both members had genotype data, and one or more members had self-reported alcohol consumption volume. First, we estimated the association of the rs1229984 genotype with alcohol consumption in the same individual after adjusting for sex, age, centre and the first 10 principal components of the reference individual. Second, we estimated the association between rs1229984 and spousal alcohol consumption after adjusting for sex, age (of both spouses), centre and the first 10 principal components of both spouses. PLINK64 was used to estimate the SNP-phenotype associations. We then estimated the effect of a 1 unit increase in an individual’s weekly alcohol consumption volume on the same variable in their partner. The Wald ratio estimate was obtained using mr_wald_ratio function in the TwoSample MR R package62 using internally derived weights. Sensitivity analyses were limited due to the use of a single genetic instrument. With two unique pairings between genotype and phenotype in each couple (male alcohol variable/female genotype and the converse), each individual in the dataset was included twice as both the reference individual and as the partner.

Spousal genotypic concordance for alcohol use

We then investigated properties of the rs122984 variant in the UK Biobank that may be relevant to assortative mating. Starting with the UK Biobank subset of 463,827 individuals of recent European descent, we removed 78,540 related individuals, which were identified using an algorithm applied to the related pair list provided by UK Biobank (third degree or closer)55, and tested Hardy-Weinberg Equilibrium (HWE) in the resulting sample of 385,287 individuals. To evaluate the possibility of population stratification, we investigated the association of both the SNP and self-reported alcohol consumption with genetic principal components and birth coordinates. As a sensitivity analysis, we also restricted the sample to a more homogeneous sample of British individuals, provided by the UK Biobank, and repeated analyses.

We then estimated the genotypic concordance between derived spouse-pairs for rs1229984 genotype using linear regression. As a sensitivity analysis, we then investigated the possibility that spousal-concordance for rs1229984 was driven by fine-scale assortative mating due to geography, which is itself associated with genetic variation within the UK67,68. For this, we restricted the sample to include only 28,653 spouse-pairs born within 100 km of each other. To test the validity of this sensitivity analysis, we explored whether birth or genetic differences (as determined by principal components) between spouses are associated with alcohol behaviour or rs122984 genotype differences in the restricted and full spouse-pair samples. The spouse-pairs were then stratified into the 22 different UK Biobank recruitment centres and logistic regression analyses were re-run to estimate the spousal-concordance of the ADH1B genotype by centre. With one unique genotype pairing within couples (male genotype/female genotype), each individual in the dataset was included only once as either the reference individual or their partner. Geographical patterns of heterogeneity across the different UK Biobank recruitment centres would provide evidence of population stratification.

As a further sensitivity analysis to explore potential population stratification bias, we compared Mendelian randomization and genotypic concordance estimates between the sample of 28,653 spouse-pairs born within 100 km of each other with estimates from the sample of 13,770 pairs born more than 100 kilometres apart, and with the full sample of 47,549 spouse-pairs. Note a subset of spouse-pairs did not have complete birth coordinate data.

Relationship length and spousal alcohol use similarities

Relationship length may influence spousal similarities for alcohol behaviour because spouses become more similar over time or because pairs with similar alcohol behaviour tend to have longer relationships. To explore these possibilities, we investigated the association between relationship length and alcohol behaviour and rs122984 genotype similarities. Without available data on relationship length, we used the mean age of each couple as a proxy and evaluated associations using a linear regression of mean couple age against spousal difference in weekly alcohol consumption and rs1229984 genotype. Analyses were adjusted for the sex of reference individual.