GENOME-WIDE ANALYSIS OF 102,084 MIGRAINE CASES IDENTIFIES 123 RISK LOCI AND SUBTYPE-SPECIFIC RISK ALLELES 165 8 on their position and the locus boundaries were updated, and finally loci within 250 kb from each other were merged. This resulted in our final list of 123 risk loci. Each risk locus was represented by its lead variant defined as the variant with the lowest P-value and named by the nearest proteincoding gene to the lead variant or by the nearest non-coding gene if there was no protein-coding gene within 250 kb. The term “Near” was added to the locus name if the lead variant did not overlap with a gene transcript. We note that the nearest gene to the lead variant need not be a causal gene. None of the 32 variants without LD information became a lead variant of a risk locus because all had a variant in the vicinity with a smaller P-value. We annotated and mapped these loci by their physical position to genes by using the Ensembl Variant Effect Predictor (VEP, GRCh37).34 We used two different thresholds for annotating the nearest genes: a distance of 20 kb and 250 kb to the nearest transcript of a gene. The filtered results including all variants within a gene or a regulatory element are in Supplementary Table 7B. Stepwise conditional analysis We performed a stepwise conditional analysis (CA) on each risk locus by using FINEMAP v1.4.35 FINEMAP uses GWAS summary statistics together with an LD reference panel and does not require individual-level data. When the reference LD does not accurately match the GWAS data, full fine-mapping is prone to false positives.36 A simpler stepwise CA is more robust to inaccuracy in reference LD because CA has a much smaller search space than full fine-mapping, and therefore CA is less likely to run into most problematic variant combinations where LD is very inaccurate. Since we did not have the full in-sample LD from our GWAS data, we only carried out the CA and not the full fine-mapping. For the CA, we included only the SNPs, but no indels, and we used the same reference LD from the UK Biobank data as we used to define the risk loci. We restricted the CA only to the variants with a similar effective sample size (Neff) by using a threshold of ±10% of the Neff of the lead SNP of the risk locus, because our summary statistics came from the meta-analysis where sample sizes per variant vary greatly. This filter excluded approximately 17% of all GWS variants and was necessary since otherwise CA led to spurious conditional P-values, such as P < 10-250, for some loci. Consequently, for two of the loci where the lead variant was an indel, the lead variant was not included in the CA. For such regions, we checked that the new lead variant from the CA output was in LD (r2 > 0.3) with the original lead variant. For one locus (rs111404218) where the lead variant does not have LD information in the UK Biobank data, there were no GWS variants left in the CA after filtering by Neff. We used the standard GWS (P < 5 × 10-8) threshold to define the secondary variants that were conditionally independent from the lead variant. The CA results are in Supplementary Tables 6A,B. eQTL mapping to genes and tissues We used two data sources to map the risk variants to genes via eQTL associations. From GTEx v8 database (https://gtexportal.org), we downloaded the data of 49 tissues. We first mapped
RkJQdWJsaXNoZXIy MTk4NDMw