64 | Chapter 3 Enrichment of cRSS motif at NBS1 binding sites at Igk, but not in the rest of the genome Simple Enrichment Analysis (SEA) of the canonical RSS in sequences directly at the RAG-specific NBS1 peaks, showed no significant enrichment for the nonamer 5’-ACAAAAACC-3’, nor for the heptamer 5’-CACAGTG-3’ (false discovery rate expressed as q-values > 0.05). Similar results were obtained when analyzing the sequences directly under the NBS1 peaks in the subset of Igk peaks (Figure 4A), suggesting that NBS1 binding sites do not strictly co-localize with RSSs. We then extended the region around each NBS1 peak on each end by an additional 500bps or 1000bps, respectively. In the subset of 500bp-extended NBS1 peaks at the Igk locus, a significant enrichment for both canonical RSS motifs was observed (3.75fold enrichment of 5’-CACAGTG-3’, q=0.0006 and 3.5-fold enrichment of 5’-ACAAAAACC-3’ q=0.0023) (Figure 4B). The enrichment of the nonamer motif 5’-ACAAAAACC-3’ further increased in the 1000bp-extended subset of NBS1 peaks at Igk locus (4.67-fold enrichment, q=0.0005), whereas the enrichment of the heptamer motif CACAGTG in the 1000bp-extended dataset was lower compared to the 500bp-extended sequences (3.75-fold vs. 1.78fold; q-values < 0.05) (Figure 4C). The nonamer, but not the heptamer, was also found to be moderately enriched in the 1000bp-extended sequences outside of the Igk loci (enrichment ratio 1.69 q-value <0.01) (SUPPL Table 1B). Zinc-finger binding sites are the most dominant motifs associated with RAGmediated DNA breaks Next, we employed MEME motif analysis to identify motifs associated with the RAG-dependent NBS1 peak sequences directly under the peaks and in the 500bp- and 1000bp-extended sequences. Two unique motifs were identified in the sequences directly under the NBS1 peaks; 15 motifs in the 500bp-extended NBS1 peak sequences, and 41 motifs in the 1000bp-extended dataset (SUPPL Table 1C), p- and q- values cut-offs were both set to 0.05. Not all the identified motifs are known motifs already associated with specific protein binding sites. In the sequences directly under the NBS1 peaks (0bp off-set), 1 known motif was identified (CTGAGTTCVAGGCCA) (Figure 5A), in the sequences with 500bp offset 3 known motifs were identified (CTGCCTCTGCCTCCC, CCCACCCACYCC, CCCTCCCCCC) (Figure 5B) and 4 different motifs were identified in the sequences with 1000bp off-set (GAGTTCSAGGMCAGC, CCCCACCCMC, CCCTCCCTCCCC, TATATATATATATA) (Figure 5C). Though other associated motifs are not known protein binding sites, amongst these other motifs we observed several simple repeat motifs, such as CTCTCTCTCTCTCTC, ACACACACACACACA (SUPPL Table 1C). Except for the TATA-box binding motif (TATATATATATATA) the rest of the known motifs are zinc-finger type motifs, and which are shared by multiple transcription factors or DNA binding proteins, amongst which proteins that play a role in
RkJQdWJsaXNoZXIy MTk4NDMw