Cindy Boer

20 | Chapter 1.1 ▲ Figure 5: Single Nucleotide Variants, Haplotypes , Imputation and Tagging SNVs. A ) 5 short stands of DNA from four versions of the same genetic location in 5 different individuals. Most of the DNA se- quence is identical, except for 12 single Nucleotide Variations (SNV) nucleotide variations (SNVs). Each SNV has several possible alleles (A/T/G/C). B ) From the 5 individuals DNA sequence, 3 haplotypes can be distinguished. A haplotype is made up of a particular combination of alleles that are inherited to- gether (in linkage). Only the SNVs are shown and 4 of them are marked. C ) Tagging SNVs. By just geno- typing these 4 tagging SNVs the genotype of the other 8 SNVs in linkage could be determined a.k.a. imputed, and identify these 3 haplotypes. Thus if an individual has the genotype G-A-T-G at these 4 SNVs, their haplotype would be determined as number 2. Note that haplotypes can occur at different frequencies in different populations, haplotype 3 in this example occurs at a lower frequency than hap- lotype 1 and 2. More reference sequences are needed to be able to detect other possible haplotypes, thereby determining the best tagging SNVs and thus to improve the accuracy and quality of imputation. 11 global populations[44], and this number has rapidly increased over the last years with the Haplotype Reference Consortium (HRC, n=64,976)[31] and Trans-Omics for Precision Medicine (TOPMed) imputation (n= 4,800,000)[45]. Resulting in more and more accurate imputations and more genetic variants for GWAS to use. About 1 million of haplotypes (in the Caucasian genome) are examined at the same time in a GWAS, thus strict multiple testing corrections are needed to prevent false positive results. For this reason SNVs are genome wide significant in a GWAS if the SNV if the threshold of p-value≤5x10 -08 is reached. Since this genome wide significance still is nominally a p-value≤0.05 (but now corrected for the 1 million LD blocks across the human genome), such genome wide discoveries have to be replicated in an inde- pendent study to distinguish true findings from coincidence. From the many GWASs performed since the first GWAS in 2005, we know that the effect sizes of the majority of associated SNVs (GWAS hits) are relatively small. Thus, due to the small effect of the

RkJQdWJsaXNoZXIy ODAyMDc0