Aster Harder

8 CHAPTER 8 164 v2.0.30 For HUNT data, we used a logistic mixed model with the saddlepoint approximation as implemented in SAIGE v0.2031 that accounts for the genetic relatedness. All models were adjusted for sex and at least for the four leading principal components of the genetic population structure (Supplementary Table 18). Age was used as a covariate when available. A detailed description is provided in Supplementary Note. For the chromosome X meta-analysis, male genotypes were coded as (0,2) in all cohorts, and the GWAS were conducted with an X chromosome inactivation model that treats hemizygous males as equivalent to homozygous females.32 We performed an inverse-variance weighted fixed-effect meta-analysis on the five study collections by using GWAMA.33 After the meta-analysis, we excluded the variants with effective sample size Neff < 5,000 to remove results with very low precision compared to the majority of variants and were left with 10,843,197 variants surpassing the QC thresholds. We estimated the effective sample size for variant i as where ƒi is the effect allele frequency for variant i and sei is the standard error estimated by the GWAS software. This quantity approximates the value 2 N t (1-t) I, where N is the total sample size (cases + controls), t is the proportion of cases and I is the imputation info (derivation in Supplementary Note). Risk loci There were 8,117 genome-wide significant (GWS) variants with the meta-analysis P-value < 5 × 10-8. For 8,067 of them that were available in UK Biobank, an LD matrix was obtained from UK Biobank using a random sample of 10,000 individuals included in the UKBB GWAS. We defined the index variants as the LD-independent GWS variants at LD threshold of r2 < 0.1 in the following way. First, the GWS variant with the lowest P-value was chosen, and subsequently all GWS variants that were in LD with the chosen variant (r2 > 0.1) were excluded. Next, out of the remaining GWS variants, the variant with the lowest P-value was chosen and the GWS variants in LD with that variant were excluded.This procedure was repeated until there were no GWS variants left. Out of the 8,067 variants with LD information, 170 were LD-independent (at r2 < 0.1). For 18/50 variants that were not found in UK Biobank, LD information was available from the 23andMe data, and all 18 variants were in LD (r2 > 0.1) with some index variant. Two of the 18 variants (rs111404218 and rs12149936) had lower P-value than the original index variant they were in LD with and hence they replaced the original index variants. For 32 GWS variants, LD remained unknown. Thus, at this stage, the GWS associations were represented by 202 = 168 + 2 + 32 index variants. Next, to define the risk loci and their lead variants, an LD block around each index variant was formed by the interval spanning all GWS variants that were in high LD (r2 > 0.6) with the index variant. Sizes of these regions ranged from 1 bp (only the variant itself, e.g., the variants with unknown LD) to 1,089 kb. Sets of regions that were less than 250 kb away from each other were merged (distance from the end of the first region to the beginning of the second region). This definition resulted in 126 loci. All other GWS variants were included in their nearest locus based

RkJQdWJsaXNoZXIy MTk4NDMw