Bivariate causal mixture model

Consider a simple additive model of genetic effects, ignoring gene-environment interactions, epistasis and dominance effects. Under these assumptions, the contribution of the genotype to the phenotype is modeled as a sum of individual contributions from genetic variants: \(y_k = \mathop {\sum}

olimits_j {g_{jk}\beta _j}\), where y k is a quantitative phenotype or disease liability of k-th individual, g jk is 0, 1, 2-coded number of reference alleles for j-th variant, and β j is the additive genetic effect of allele substitution. We say that a genetic variant is causal for a trait if it has a non-zero effect on that trait (β j ≠ 0).

MiXeR builds upon the univariate causal mixture model16, \(\beta _j\sim \pi _0N\left( {0,0} \right) + \pi _1N\left( {0,\sigma _\beta ^2} \right)\), which assumes that only a small fraction (π 1 ) of variants have an effect on the trait, while the effect of the remaining variants is zero. For the mathematical convenience, we chose a Gaussian distribution for the non-null arm of the causal mixture. A drawback with the gaussian prior is that a large fraction of causal variants will have effect sizes close to zero. We would prefer to count a variant as causal only if it has a sufficiently large effect size, using for example a bi-modal prior distribution with probability mass separated from zero, but for such prior, it was not feasible to accurately model the effects of the LD structure.

In a joint analysis of two traits, we expect some variants to affect both traits; some variants to affect one trait but not the other; and most variants to have no effect on either trait. We assumed that for a given trait, all causal variants follow the same distribution of effect sizes, regardless of what effect a variant has on the other trait. Within the shared component, we model correlation of effect sizes, to account for genetically correlated traits. Based on these assumptions, MiXeR models additive genetic effects β 1j , β 2j of variant j on the two traits as a mixture of four bivariate Gaussian components (Fig. 1):

$$\left( {\beta _{1j},\beta _{2j}} \right)\sim \pi _0N(0,0) + \pi _1N\left( {0,{\mathbf{\Sigma }}_{\bf{1}}} \right) + \pi _2N\left( {0,{\mathbf{\Sigma }}_{\bf{2}}} \right) + \pi _{12}N\left( {0,{\mathbf{\Sigma }}_{{\bf{12}}}} \right),$$ (1)

$${\mathbf{\Sigma }}_{\bf{1}} = \left[ {\begin{array}{*{20}{c}} {\sigma _1^2} & 0 \\ 0 & 0 \end{array}} \right],\quad {\mathbf{\Sigma }}_{\bf{2}} = \left[ {\begin{array}{*{20}{c}} 0 & 0 \\ 0 & {\sigma _2^2} \end{array}} \right],\quad {\mathbf{\Sigma }}_{{\bf{12}}} = \left[ {\begin{array}{*{20}{c}} {\sigma _1^2} & {\rho _{12}\sigma _1\sigma _2} \\ {\rho _{12}\sigma _1\sigma _2} & {\sigma _2^2} \end{array}} \right]$$ (2)

where π 1 and π 2 are weights of the unique components (variants with an effect on the first trait only, and on the second trait only); π 12 is a weighting of the component affecting both traits; and π 0 is a fraction of variants that are non-causal for both traits, π 0 + π 1 + π 2 + π 12 = 1; \(\sigma _1^2\) and \(\sigma _2^2\) control expected magnitudes of per-variant effect sizes; and ρ 12 is the correlation coefficient of the effect sizes in the shared component. All parameters are assumed to be the same for all genetic variants.

The effects \(\left( {\hat \beta _{1j},\hat \beta _{2j}} \right)\) estimated by a GWAS, represent only proxies of the true causal effects (β 1j , β 2j ), which are distorted by limited sample size (poor statistical power), cryptic relatedness within a GWAS sample, as well as LD between variants. To disentangle these effects, we derive the likelihood term for observed GWAS signed test statistics (z 1j , z 2j ), incorporating effects of the LD structure (allelic correlation r ij between variants i and j); heterozygosity H j = 2p j (1 − p j ), where p j is the minor allele frequency of the j-th variant; the number of subjects genotyped per variant (N 1j and N 2j ); and variance distortion parameters \(\sigma _{01}^2\), \(\sigma _{02}^2\), and ρ 0 . Specifically (see Supplementary Note 1),

$$\left( {z_{1j},z_{2j}} \right) = \left( {\delta _{1j},\delta _{2j}} \right) + N\left( {\left( {0,0} \right),\left[ {\begin{array}{*{20}{c}} {\sigma _{01}^2} & {\rho _0\sigma _{01}\sigma _{02}} \\ {\rho _0\sigma _{01}\sigma _{02}} & {\sigma _{02}^2} \end{array}} \right]} \right),$$ (3)

$$\delta _{ \cdot j} = \sqrt {N_{ \cdot j}} \mathop {\sum }\limits_i \sqrt {H_i} r_{ij}\beta _{ \cdot j}$$

The nine parameters of the model (\(\pi _1,\pi _2,\pi _{12},\sigma _1^2,\sigma _2^2,\rho _{12},\sigma _{01}^2,\sigma _{02}^2,\rho _0\)) are fit by direct optimization of the weighted log likelihood, with standard errors estimated from the Observed Fisher’s Information matrix.

Forcing π 12 = 1 (so that π 0 = π 1 = π 2 = 0) reduces our model to an infinitesimal assumption that underlies cross-trait LD score regression8. Under this constraint, our model predicts that GWAS-signed test statistics follow a bivariate Gaussian distribution with zero mean and variance–covariance matrix

$${\mathbf{\Sigma }}_{\bf{j}} = \ell _j\left[ {\begin{array}{*{20}{c}} {N_{1j}\sigma _1^2} & {\sqrt {N_{1j}N_{2j}} \rho _{12}\sigma _1\sigma _2} \\ {\sqrt {N_{1j}N_{2j}} \rho _{12}\sigma _1\sigma _2} & {N_{2j}\sigma _2^2} \end{array}} \right] + \left[ {\begin{array}{*{20}{c}} {\sigma _{01}^2} & {\rho _0\sigma _{01}\sigma _{02}} \\ {\rho _0\sigma _{01}\sigma _{02}} & {\sigma _{02}^2} \end{array}} \right],$$

i.e., (z 1j , z 2j ) ∼ N(0, Σ j ), where \(\ell _j = \mathop {\sum}

olimits_i {H_ir_{ij}^2}\) is the LD score (adjusted for heterozygosity). This model is consistent with cross-trait LD score regression, with expected chi-square statistics \(E(z_{1j}^2)\), \(E(z_{2j}^2)\), and cross-trait covariance E(z 1j z 2j ) being proportional to the LD score of j-th SNP, and parameters ρ 0 , σ 01 , σ 02 playing the role of LD score regression intercepts47. The only distinction here is that we choose to model effect sizes that are independent of allele frequency, leading to the incorporation of H i into our model; this factor is absent from the LD score regression model due to the assumption there of effect sizes that are inversely proportional to H i . Thus, MiXeR is a direct extension of cross-trait LD score regression, which relaxes the infinitesimal assumption.

Model for bivariate distribution of GWAS z-scores

We derive two models for GWAS z-scores, which we call “fast model” and “full model”. The “fast model” is quicker to run, and we use it to perform an initial search in the space of the model’s parameters. The “full model” is slower but more accurate, and we use it for a final tuning of model estimates.

The “full model” for GWAS z-scores approximates (z 1j , z 2j ) distribution of a given GWAS SNP as a mixture of K = 20,000 bivariate normal distributions, all having equal weight in the mixture. For each k = 1, …, K, we randomly draw the location of causal variants (π 1 N causal variants specific to the first trait, π 2 N specific to the second trait, and π 12 N shared causal variants, where N denotes the total number of variants in the reference panel), and calculate the variance–covariance matrix \({\mathbf{\Sigma }}_{{\bf{kj}}}^\prime\) from equation (3), using estimated LD r2 correlations between the assumed causal variants and the GWAS SNP. Then

$$\left( {z_{1j},z_{2j}} \right) = \left( {\delta _{1j},\delta _{2j}} \right) + N\left( {\left( {0,0} \right),\left[ {\begin{array}{*{20}{c}} {\sigma _{01}^2} & {\rho _0\sigma _{01}\sigma _{02}} \\ {\rho _0\sigma _{01}\sigma _{02}} & {\sigma _{02}^2} \end{array}} \right]} \right),$$ (4)

$$\left( {\delta _{1j}/\sqrt {N_{1j}} ,\delta _{2j}/\sqrt {N_{2j}} } \right) \sim \frac{1}{K}\mathop {\sum}\limits_{k = 1..K} {N\left( {\left( {0,0} \right),{\mathbf{\Sigma }}_{{\bf{kj}}}^\prime } \right)}$$

The “fast model” is derived from the method of moments (see Supplementary Note 1):

$$\begin{array}{*{20}{l}} {\left( {\delta _{1j}/\sqrt {N_{1j}} ,\delta _{2j}/\sqrt {N_{2j}} } \right)} \hfill & \sim \hfill & {\left[ {(1 - \pi _{1j}^\prime )N\left( {0,0} \right) + \pi _{1j}^\prime N\left( {0,{\mathbf{\Sigma }}_{{\bf{1j}}}^\prime } \right)} \right] \oplus } \hfill \\ {} \hfill & {} \hfill & {\left[ {\left( {1 - \pi _{2j}^\prime } \right)N\left( {0,0} \right) + \pi _{2j}^\prime N\left( {0,{\mathbf{\Sigma }}_{{\bf{2j}}}^\prime } \right)} \right] \oplus } \hfill \\ {} \hfill & {} \hfill & {\left[ {\left( {1 - \pi _{12,j}^\prime } \right)N\left( {0,0} \right) + \pi _{12,j}^\prime N\left( {0,{\mathbf{\Sigma }}_{{\bf{12,j}}}^\prime } \right)} \right],} \hfill \end{array}$$ (5)

where ⊕ denotes convolution of probabilistic distribution functions (so that right-hand side evaluates to a mixture of eight components), \(\pi _{cj}^\prime = \ell _j\pi _c/{\mathrm{\eta }}_j\) is adjusted weight of mixture component (c ∈ 1, 2, 12); \({\mathbf{\Sigma }}_{{\bf{cj}}}^\prime = {\mathrm{\eta }}_j{\mathbf{\Sigma }}_{\bf{c}}\) is adjusted variance–covariance matrix; \(\ell _j = \mathop {\sum }\limits_i H_ir_{ij}^2\) is the LD score, adjusted for heterozygosity48; and \({\mathrm{\eta }}_{cj} = \left( {\pi _c\ell _j + \left( {1 - \pi _c} \right)\frac{{\mathop {\sum }

olimits_i H_i^2r_{ij}^4}}{{\mathop {\sum }

olimits_i H_ir_{ij}^2}}} \right)\) can be interpreted as shape parameter that affects fourth and higher moments of the distribution. This model explains second moments \(E\left[ {Z_{1j}^2} \right]\), E [Z 1j Z 2j ], \(E\left[ {Z_{2j}^2} \right]\) and fourth moments \(E\left[ {Z_{1j}^4} \right]\), \(E\left[ {Z_{1j}^2Z_{2j}^2} \right]\), \(E\left[ {Z_{2j}^4} \right]\) of z-score distribution, and forms a theoretical basis for the mixture model of sparse and ubiquitous effects49,50. Of interest is that the “fast model” involves the forth power of allelic correlation \(r_{ij}^4\), which is directly proportional to kurtosis (measure of heavy tails) of z-score distribution.

LD structure estimation

To estimate the LD structure, we use 489 individuals from the 1000 Genome project51 (phase 3 data), obtained from the LD score regression website8,22,52. In total, 14 individuals were excluded due to relatedness53. For simulations, LD scores were estimated from the actual genotypes that we use to produce synthetic GWAS summary statistics. LD r2 coefficients were calculated using PLINK54 with LD r2 cutoff of 0.05 and fixed window size of 50,000 SNPs, corresponding on average to a window of 16 centimorgans. We deliberately chose a larger LD window compared with the LDSR-recommended window of 1 centimorgan, because the later appears to truncate a noticeable part of LD structure. At the same time, we did not observe an effect of using an unbiased estimate55 of r2, thus fall back to the standard Pearson correlation coefficient. We employ small integer compression56 for efficient storage of the LD matrix.

Fit procedure

We fit the model by direct optimization of weighted log likelihood

$$F\left( \theta \right) = \mathop {\sum}

olimits_j {w_j\,{\mathrm{log}}\left( {pdf(z_j|\theta )} \right),}$$ (6)

where \(\theta = \left( {\pi _1,\pi _2,\pi _{12},\sigma _1^2,\sigma _2^2,\rho _{12},\sigma _{01}^2,\sigma _{02}^2,\rho _0} \right)\) is a vector of all parameters being optimized, and weights w j chosen by random pruning (64 iterations at LD r2 0.1). Optimization is done by the Nelder–Mead Simplex Method57 as implemented in MATLAB’s fminsearch. First, we fit univariate parameters separately for each trait, i.e., \(\pi _1^u,\sigma _1^2\), \(\sigma _{01}^2\) for the first trait, and similarly for the second trait. Univariate fit employs a sequence of optimizations to ensure robust convergence: first, we use the “fast model” under constraint \(\pi _1^u = 1\) to find \(\sigma _{1,inf}^2\) and to initialize \(\sigma _{01}^2\); second, we use constraint \(\pi _1^u\sigma _1^2 = \sigma _{1,inf}^2\) to find initial values of \(\pi _1^u\) and \(\sigma _1^2\), again with the “fast model”. Finally, we use the “full model” and unconstrained optimization to jointly fit \(\pi _1^u,\sigma _1^2\), \(\sigma _{01}^2\) parameters. The same procedure is repeated for the second trait, to find \(\pi _2^u,\sigma _2^2\), \(\sigma _{02}^2\). To improve convergence, we parametrize univariate log-likelihood as a function of \({\mathrm{log}}\left( {\pi _1^u\sigma _1^2} \right)\) and \({\mathrm{log}}\left( {\pi _1^u/\sigma _1^2} \right)\), which represent almost independent dimensions of the energy landscape. In bivariate optimization, we use the “fast model” and constraint π 12 = 1 to estimate r g and ρ 0 . Then, we proceed with the “full model” optimization of the parameters specific to the bivariate model (π 12 , ρ 12 ), constraining all other parameters to their univariate estimates, and also constraining r g and ρ 0 to the estimates from the infinitesimal model. The additional analysis (Supplementary Tables 7, 9) uses right-censoring58 of z-scores exceeding z t = 5.45, by using cumulative distribution function59 in the log likelihood:

$$F\left( \theta \right) = \mathop {\sum}

olimits_{j:|z_j| \le z_t} {w_j\,{\mathrm{log}}\left( {pdf(z_j|\theta )} \right)} + \mathop {\sum}

olimits_{j:|z_j| > z_t} {w_j\,{\mathrm{log}}\left( {cdf(z_{max}|\theta )} \right)}$$ (7)

Standard error estimation

We estimate standard errors of all parameters from the observed Fisher’s information, based on the “fast model”. It is known from the likelihood optimization theory that the observed Fisher’s information may not be suitable for a parameter near its boundary, which is applicable to the mixture weights π 1 , π 2 , π 12 , and the correlation of effect sizes ρ 12 . To mitigate this problem, we apply transformations— MATLAB’s logit() for π 1 , π 2 , π 12 , exp() for \(\sigma _1^2,\sigma _2^2,\sigma _{01}^2,\sigma _{02}^2\), and erf() for ρ 0 , ρ 12 , and estimated a variance–covariance matrix of errors in the transformed parameter space. We validated that our estimates based on the observed Fisher’s information are in good agreement with block jack-knife estimates. To estimate standard errors for a function of the parameters, such as r g or h2, we incorporate linear correlation among parameter errors in the transformed space. We sample N = 1000 realizations of the parameter vector, calculating the function (e.g., r g or h2) on each of them, and report the standard deviations. In cases when joint hessian was not positive definite, we estimate marginal errors of fitted parameters.

Akaike and Bayesian information criteria

We apply standard formulas and estimate AIC = 2k − 2F and \({\mathrm{BIC}} = k\,{\mathrm{ln}}\,n - 2F\), where F is the log-likelihood from Eqs. (6) or (7), k is the number of parameters (k = 2 for an infinitesimal model, and k = 3 for causal mixture model), and \(n = \mathop {\sum}

olimits_j {w_j}\) is the effective number of SNPs (the sum of weights across all SNPs used to fit the model).

Large LD blocks

The log-likelihood cost function and the Q–Q plots apply a weighting scheme to SNPs to 0avoid overcounting evidence from large LD blocks. As an alternative to weighting by inverse LD score, we chose to infer the weights by random pruning. This technique is a stochastic procedure which averages log-likelihood function across repeatedly selected subsets of variants, such that for each pair of variants i, j in a subset J the squared allelic correlation \(r_{ij}^2\) falls below a certain threshold. Given T iterations of random pruning the log-likelihood function can be calculated as follows:

$$F\left( \theta \right) = \frac{1}{T}\mathop {\sum}

olimits_{t = 1}^T {\mathop {\sum}

olimits_{j \in J_t} {{\mathrm{log}}\left( {pdf(z_j|\theta )} \right)} }$$ (8)

which is equivalent to weighted log-likelihood \(F\left( \theta \right) = \mathop {\sum}

olimits_j {w_j\,{\mathrm{log}}\left( {pdf\left( {z_j|\theta } \right)} \right)}\) with weights w j = |{t:j ∈ J t }|/T, |S| denotes cardinality of set S. Random pruning with stringent threshold r2 = 0.1 justify independent modeling of the residuals in Eq. (3) across SNPs, which otherwise would be correlated.

Heritability estimates

In an additive model, SNP heritability is defined as a sum across all causal variants: \(\sigma _\beta ^2\mathop {\sum}

olimits_{j:\beta _j

e 0} {2p_j\left( {1 - p_j} \right)}\), which we approximate from an average heterozygosity of all variants in the reference: \(\pi _1H_{{\mathrm{total}}}\sigma _\beta ^2\), where \(H_{{\mathrm{total}}} = \mathop {\sum}

olimits_j {2p_j\left( {1 - p_j} \right)}\). To estimate the proportion of causal variants that explain a certain fraction of heritability (Supplementary Fig. 11), we randomly sample N = 10,000 causal effects from the reference, draw their effects β j from normal distribution, sort according to \(\beta _j^2p_j\left( {1 - p_j} \right)\), and report the fraction of variants that cumulatively account for 90% of heritability.

Genetic correlation

Parameter ρ 12 in MiXeR defines the correlation of effect sizes within the shared polygenic component. Genome-wide genetic correlation, calculated across all SNPs, is related to ρ 12 by the following formula that involves polygenicity \(\pi _1^u = \pi _1 + \pi _{12}\) and \(\pi _2^u = \pi _2 + \pi _{12}\) of the traits, and polygenic overlap π 12 :

$$r_g = \rho _{12}\pi _{12}/\sqrt {\pi _1^u\pi _2^u}$$ (9)

For traits with K-fold difference in polygenicity (\(\pi _1^u = K\pi _2^u\)), the formula predicts an upper bound on genome-wide genetic correlation: \(r_g \le \rho _{12}/\sqrt K\), where equality holds if causal variants of the less polygenic trait form a subset of the higher-polygenic trait.

Quantile–quantile plots

Univariate Q–Q plots and stratified Q–Q plots for the model were constructed from pdf j (z) density as defined by Eq. (3), given fitted parameters of the model and LD structure of j-th SNP, calculated across a fine grid of z-scores ranging from 0 to 38 with 0.05 step. We average pdf j (z) across 1% of randomly sampled SNPs, and numerically integrate the resulting probability density function to convert it into a cumulated distribution function. Error bars on data Q–Q plots represent the 95% binomial confidence interval \(q \pm 1.96\sqrt {q\left( {1 - q} \right)/n_{{\mathrm{total}}}}\), where q is the probability of observing a p-value as extreme as, or more extreme then the chosen p-value, and n total is the effective number of SNPs after controlling for LD structure, which in our case was calculated as a sum of random pruning weights across all SNPs.

GWAS power curves

Causal mixture model can project the future of GWAS discoveries, by estimating proportion S(N) of narrow-sense heritability captured by genome-wide significant SNPs at a given sample size N. The S(N) is defined as follows:

$$S\left( N \right) = \frac{{\mathop {\sum }

olimits_j {\int}_{z:|z| \ge z_t} {C(z,N,j)dz} }}{{\mathop {\sum }

olimits_j {\int}_z {C\left( {z,N,j} \right)dz} }},$$ (10)

where z t = 5.45 gives z-score corresponding to the standard genome-wide significance threshold 5 ∙ 10−8, and C(z, N, j) ≡ P(z, N, j) ⋅ E(δ2|z, N, j) denotes a posterior effect size E(δ2|z, N, j) of the non-centrality parameter δ2 for a GWAS SNP j, given certain z-score, multiplied by a prior probability of observing that z-score. Probability density function P(z, N, j) is given by Eq. (4), and E(δ2|z, N, j) can be calculated from the Bayesian rule. Thus, \(C\left( {z,N,j} \right) = {\int} {\delta ^2P\left( {z|\delta } \right)P\left( {\delta ,j} \right)d\delta }\), where \(z|\delta \sim N\left( {\delta ,\sigma _0^2} \right)\). Analytical expressions for C(z, N, j) and \({\int}_{z:|z| \ge z_t} {C(z,N,j)dz}\) are given in the Supplementary Note 1.

SNPs in the analysis

To enable a direct comparison of our model with LD score regression, we use the same set of SNPs in our log-likelihood optimization, which consists of approx. 1.1 million variants, subset of 1000 Genomes and HapMap360, with MAF above 0.05, ambiguous SNPs excluded, imputation INFO above 0.9, MHC and other long-range LD regions excluded. Calculation of the LD structure, LD scores \(\ell _j\) and shape parameter η j are based on 9,997,231 SNPs from 1000 Genomes Phase 3 data, downloaded from LD score regression website. In simulations we generate GWAS and estimate LD structure on a subset of 11,015,833 SNPs from 1000 Genomes Phase 3, with MAF above 0.002, call rate above 90%, excluding duplicated RS numbers; the fit procedure was constrained to ~ 130 K GWAS SNPs, keeping only HapMap3 SNPs, and pruning SNPs at LD r2 threshold of 0.1.

LD score regression estimates

For dichotomous phenotypes, we used an effective sample size of N eff = 4/(1/N case + 1/N cont ) to account for imbalanced numbers of cases and controls, both in MiXeR and in LD score regression. In addition, we ran LDSR using MiXeR MAF model (using --per-allele flags in LD score estimation), and show the results alongside with original LDSR estimates (Supplementary Tables 7, 9). For case/control phenotypes heritability is reported on the observed scale.

Simulations

In our simulations, we use a panel of N = 100,000 samples and 11,015,833 SNPs, generated by HapGen261 using 1000 Genomes51 data to approximate the LD structure for European ancestry. To avoid relatedness across individuals, we run HapGen2 for small disjoint chunks of about 2900 SNPs at a time, 3920 chunks in total. The chunks were acting as additional recombination hotspots, causing certain changes in the distribution of the LD scores (Supplementary Fig. 20). However, the total amount of allelic correlation in the HapGen2 panel was still substantial, for example the median LD scores in the HapGen2 panel was 66.4, versus 63.5 in the 1000 Genomes panel, which makes the HapGen2 panel appropriate for our simulations.

We also validated that the HapGen2 panel shows no signatures of cryptic relatedness and sample stratification. The “plink --pca” analysis of the genotype matrix shows no signatures of sample stratification, as shown by the scatter plot of the first and second principal components (Supplementary Fig. 20). The “plink --genome” test found no related individuals (PI_HAT measure was below 0.1 for all pairs of individuals). We use a subset of 115,267 SNPs in the analysis, selected according to steps described in the PCA module of the Ricopili GWAS pipeline.

For each simulation run, we use PLINK to obtain GWAS summary statistics, including Wald’s test z-score and p-value, of two synthesized quantitative phenotypes, with complete sample overlap between GWAS samples. Quantitative phenotype y k of k-th sample is calculated via a simple additive genetic model, \(y_k = \mathop {\sum}

olimits_j {g_{kj}\beta _j + {\it{\epsilon }}_k}\), where g kj is the number of reference alleles for j-th SNP on k-th sample, β j is causal effect size, and \({\it{\epsilon }}\) is the residual vector drawn from normal distribution with zero mean and variance chosen in a way that sets heritability h2 = var(Gβ)/var(y) to a predefined level.

For the simulations shown in Fig. 2, we draw effect sizes (β 1j , β 2j ) from the four-component mixture model (Eq. (1)), varying polygenicity of each phenotype (\(\pi _1^u = \pi _1 + \pi _{12}\) and \(\pi _2^u = \pi _2 + \pi _{12}\)), and polygenic overlap (π 12 ). We chose the total polygenicity in both traits to be 3 × 10−3 or 3 × 10−4 and include an additional scenario of uneven polygenicity (\(\pi _1^u =\)3 × 10−3, \(\pi _2^u =\)3 × 10−4). For each combination \(\pi _1^u,\pi _2^u\) and h 2 , we set polygenic overlap to be a fraction of total polygenicity \(\pi _{12} = f\pi _1^u\), choosing the fraction f from six equally spaced values (0.0 to 1.0 with a step of 0.2). Correlation of effect sizes ρ 12 set to 0.0 or 0.5. Heritability was set to 0.1, 0.4, or 0.7, which let us keep GWAS sample size constant (N = 100,000) because the distribution of GWAS z-scores depends on N and h 2 only through their product, h 2 × N (thus, simulations with N = 700,000 and h 2 = 0.1 would be equivalent to our scenario with N = 100,000 and h 2 = 0.7). Finally, for each combination of heritability, polygenicity, polygenic overlap, and correlation of effect sizes, we repeat simulations ten times.

For the simulations with differential enrichment, we simulate three levels of polygenicity (3 K, 30 K, and 300 K causal variants), three levels of heritability (0.1, 0.4, and 0.7), and for each combination, generate 20 pairs of genetically independent traits (except for having shared pattern of enrichment). To simulate the enrichment, we keep a constant variance of effect sizes across all SNPs, but modulate the probability of having causal variant proportionally to LDSR regression coefficient. We use --per-allele flags in LD score estimation to run simulations with the MiXeR MAF model.

For simulations with MAF-dependent architectures, we simulate effect sizes as follows:

$$\beta _j \sim \pi N\left( {0,H_j^S\sigma _\beta ^2} \right) + \left( {1 - \pi } \right)N\left( {0,0} \right)$$ (11)

Parameter S = 0 corresponds to the MiXeR MAF model, S = −1 corresponds to the LDSR MAF model. The same model is implemented in BayesS software19, thus we choose parameter S from −0.25, −0.50, and −0.75, which corresponds to the range of BayesS estimates observed on real GWAS data.