Katarina Ochodnicka

GENOMIC INSTABILITY IN DEVELOPING B-CELLS AND B-CELL LEUKEMIA: EXPLORING THE ROLE OF THE RAG1/2 COMPLEX Katarína Ochodnická

Genomic instability in developing B-cells and B-cell leukemia: exploring the role of the RAG1/2 complex Katarína Ochodnická

Genomic instability in developing B-cells and B-cell leukemia: exploring the role of the RAG1/2 complex Academic thesis, University of Amsterdam, Amsterdam, the Netherlands Author: Katarína Ochodnická Printing: Ridderprint, ridderprint.nl Cover design: Marcel Jansen, ziehoe.nl Layout and design: Bart Roelofs, persoonlijkproefschrift.nl ISBN: 978-94-6506-063-7 Printing of this thesis was financially supported by: Department of Pathology, University Medical Centers, location AMC, Amsterdam, the Netherlands Nordic Pharma BV Copyright ©2024, Katarína Ochodnická, Amsterdam, the Netherlands All rights reserved. No part of this thesis may be reproduced or transmitted in any form or by any means, without express written permission from the author.

Genomic instability in developing B-cells and B-cell leukemia: exploring the role of the RAG1/2 complex ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. ir. P.P.C.C. Verbeek ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Aula der Universiteit op woensdag 23 oktober 2024, te 11.00 uur door Katarína Ochodnická geboren te Bratislava

Promotiecommissie Promotor: prof. dr. C.J.M. van Noesel AMC-UvA Copromotor: dr. J.E.J. Guikema AMC-UvA Overige leden: prof. dr. S.T. Pals AMC-UvA prof. dr. E.F. Eldering AMC-UvA prof. dr. M.C. van Zelm Erasmus Universiteit Rotterdam dr. M. van der Burg Universiteit Leiden dr. H.B. Jacobs NKI prof. dr. C.E. van der Schoot AMC-UvA Faculteit der Geneeskunde

to my daughters

Table of content Chapter 1 General introduction and outline of this thesis 9 Chapter 2 Role of RAG1 and RAG2 in B-cell development, signaling and (off) target DNA damage 15 Chapter 3 RAG1/2 induces double-stranded DNA breaks at non-Ig loci in the proximity of single sequence repeats in developing B cells 53 Chapter 4 The DNA damage response regulates RAG1/2 expression in pre-B cells through ATM-FOXO1 signaling 81 Chapter 5 NF-κB and AKT signaling prevent DNA damage in transformed pre-B cells by suppressing RAG1/2 expression and activity 115 Chapter 6 DNA damage-induced p53 downregulates expression of RAG1 though a negative feedback loop involving miR-34a and FOXP1 147 Chapter 7 General discussion 167 Appendix English summary 194 Nederlandse samenvatting 197 Zhrnutie v slovenčine 201 Authors’ contributions & Research funding 204 PhD portfolio & Publications 208 About the author 211 Acknowledgements 212

CHAPTER General introduction and outline of the thesis 1

10 | Chapter 1 The immune system is a collection of molecules, cells, and tissues, which protect an individual from infectious microbes and eliminate foreign substances, and consists of two main branches: the innate immune system and the adaptive immune system. The innate immune system provides immediate, nonspecific defense mechanisms, including physical barriers like the skin, as well as immune cells that engulf and destroy pathogens. The innate immune system responds always in the same way to repeated infections and it is not able to escalate the responses with each successive exposure to a particular microbe or antigen. The innate immune system is phylogenetically the oldest system of host defense. The adaptive immune system, on the other hand, develops a more specific and targeted response over time, that can recognize and “remember” specific pathogens, providing a more tailored and long-lasting defense. The most characteristic feature of adaptive immunity is its unique specificity for distinct molecules. Lymphocytes and antigen-presenting cells (APC) are the main components of adaptive immunity. There are distinct subpopulations of lymphocytes that differ in their function and how they recognize antigens. B lymphocytes (B cells) recognize extracellular antigens, and are endowed with the unique property to produce antibodies. Antibodies are effectors of humoral immunity; they are serum proteins that initiate processes leading to the neutralization of antigens. T lymphocytes (T cells), on the other hand, mediate cellular immunity. T cells are able to directly kill the infected cells and they also direct immune responses by helping B cells to eliminate the pathogens. The communication between immune cells is crucial for an effective immune response, and various signaling molecules, such as cytokines, help coordinate these interactions. The immune system’s ability to distinguish self from non-self prevents it from attacking the body’s own cells, a condition known as autoimmunity. B cells are specialized white blood cells that are responsible for producing antibodies, which are proteins that recognize and neutralize specific pathogens like bacteria and viruses. The total of antibodies with different specificities is called the B-cell repertoire, a vast and unique collection of distinct receptor molecules on the surface of B cells. The B-cell repertoire is generated through a process known as V(D)J recombination, occurring in the bone marrow during the early stages of B- and T-cell development. This genetic rearrangement process creates an enormous variety of potential antibody specificities, allowing the B and T cells to recognize a wide range of antigens. Upon maturation in the bone marrow, B cells migrate to the bloodstream and lymphoid tissues, where they can be activated, e.g. in response to foreign invaders. The diversity and adaptability of the B-cell repertoire contribute significantly to the immune system’s ability to defend the body against a myriad of pathogens. This intricate system ensures that the immune response is tailored to the specific challenges posed by different infectious agents, providing a key defense mechanism for maintaining overall health and well-being.

1 General introduction and outline of the thesis | 11 At the same time, this system poses a serious threat to genome stability if not tightly regulated. The V(D)J gene recombination process requires the activity of the recombination-activating gene-1 (RAG1) and recombination-activating gene-2 (RAG2) protein complex. RAG1 and RAG2 form a complex that cuts DNA at specific sites within the immunoglobulin (Ig) and T-cell receptor (Tcr) loci, thereby initiating gene recombination. DNA cleavage in genes outside of the Ig regions could give rise to potentially oncogenic genetic lesions such as chromosomal translocations or deletions. As a matter of fact, there is compelling evidence that some of the genomic lesions identified in subsets of B-cell acute lymphoblastic leukemia (B-ALL) result from aberrant RAG1/2 recombination activity. Though in the last couple of decades the physiological mechanisms of the regulation RAG1/2 regulations have extensively been studied, the mechanisms of pathological regulation of RAG1/2 are less well described. This thesis aims to provide a deeper understanding of the mechanisms that regulate the expression and the activity of RAG1 and RAG2 in response to exogenous DNA damage, but also to DNA damage originating from RAG-dependent V(D)J recombination activity itself. Understanding the intricacies of RAG1/2 regulation is important for our understanding of how the integrity of the genome is maintained in developing B cells that have to deal with DNA damage during the Ig receptor formation. The misregulation of RAG1/2, or the illegitimate targeting of RAG1/2, might also pose a threat to genome integrity, as off-target DNA cleavage (outside the Ig loci), could potentially give rise to genomic lesions that may result in oncogenic mutations. In Chapter 2, the current understanding of B-cell development and the regulation of RAG1 and RAG2 expression and activity is extensively summarized. The responses of pre-B cells to DNA damage, and specifically the regulatory role of the double-stranded DNA breaks (DSBs) are described in detail. This chapter reviews also the evidence for a possible role of RAG1/2 in the etiology of B-cell malignancies, and the targeting properties of RAG1/2 in this context. Aberrant RAG1/2 targeting during B-cell development has been implicated in the development of lymphoid malignancies. In Chapter 3 we developed a method that allows the identification of RAG1/2-induced DSBs on a genome-wide scale in developing pre-B cells. We found that many of the RAG1/2-induced DNA breaks can be detected outside of the Ig loci, providing in vivo evidence for the off-target activity of RAG1/2. Moreover, analysis of the common denominators with regard to the genomic locations and sequence motifs of the RAG1/2-induced DNA breaks revealed an enrichment of simple sequence repeats (SSR) and GC-rich regions in the proximity of the RAG1/2-induced DNA breaks. To gain a further understanding of the implications of programmed DNA damage in developing lymphocytes, it is essential to unravel the intricacies of the regulation of RAG1/2 activity during B-cell development, especially in the context of the DNA damage originating from V(D)J recombination itself. Inappropriate regulation of RAG1/2 expression may result in per-

12 | Chapter 1 sistent RAG1/2 activity, thereby potentially increasing the occurrence of genomic lesions resulting in oncogenic events. In chapters 4, 5, and 6 we describe the interplay between DNA breaks and RAG1/2 activity and regulation. In Chapter 4 we showed that extrinsic DNA damage leads to the downregulation of RAG1/2 expression in pre-B cells. We identified the ataxia telangiectasia mutated (ATM) kinase as a key player in the DNA damage-dependent regulation of RAG1/2 transcription, which primarily impinges on the transcription factor forkhead box O1 (FOXO1). In Chapter 5 we showed that the nuclear factor kappa-B (NF-kB) and phosphoinositol-3 kinase (PI3K)/ AKT signaling pathways act in concert to suppress inappropriate RAG1/2 expression and activity in human and mouse pre-B cells and in B-cell acute lymphoblastic leukemia (BALL) patients. Finally, in Chapter 6 we identified an additional regulatory mechanism that regulates RAG1/2 expression in response to DNA damage, involving the p53-dependent expression of microRNA-34a (miR-34a), which acts by inhibiting forkhead box protein P1 (FOXP1), a transcription factor that drives RAG1/2 expression. In summary, the studies presented in this thesis show that RAG1/2 is capable of introducing DNA breaks outside of the Ig loci, located in the proximity of SSR sequences. In addition, our studies show that DNA breaks in developing pre-B cells also have an important regulatory function by instigating regulatory feedback loops that limit the expression and activity of RAG1/2, thereby contributing to safeguarding genome stability and integrity in developing B cells. These findings are further summarized and discussed in Chapter 7, outlining their potential implications and perspectives.

CHAPTER Role of RAG1 and RAG2 in B-cell development, signaling and (off) target DNA damage Katarina Ochodnicka-Mackovicova, Jeroen E.J. Guikema 2

16 | Chapter 2 Abstract This review explores the intricate regulatory mechanisms governing RAG1 and RAG2 expression and activity during the development of B cells, with a particular focus on their implications in the development of lymphoid malignancies such as B-cell acute lymphoblastic leukemia (B-ALL). We summarize current knowledge on B-cell development, the mechanism of V(D)J recombination, and the role of the RAG1 and RAG2 (RAG1/2) complex. We discuss the role of the DNA damage response (DDR) during V(D)J recombination, shedding light on its involvement in the regulation of the activity and expression of RAG1/2. Understanding the interplay between the various regulatory processes provides insights into the potential links between aberrant V(D)J recombination, dysregulation of RAG1/2, and the onset of lymphoid malignancies, contributing to our understanding of the molecular events underlying these diseases.

2 Role of RAG1 and RAG2 in B-cell development | 17 Adaptive immune system The adaptive immune system evolved into a system that allows specific and targeted responses, capable of recognizing particular pathogens after repeated encounters, ensuring a long-lasting defense. The hallmark feature of the adaptive immune system is its unique specificity for distinct molecules, primarily facilitated by lymphocytes and antigen-presenting cells (APCs). Among lymphocytes, B cells specialize in recognizing extracellular antigens and producing antibodies, essential for humoral immunity. The antibodies initiate processes leading to antigen neutralization. On the other hand, T cells play a vital role in cellular immunity by recognizing antigens from intracellular microbes and eliminating infected cells. Unlike B cells, T cells do not produce antibodies. Antibodies have a polypeptide structure consisting of four protein chains: two identical heavy (H) chains and two identical light (L) chains, connected by disulfide bonds and non-covalent interactions. These chains come together to form a Y-shaped structure known as the basic antibody monomer. Each chain consists of a constant region and a variable region. The variable regions of the heavy and light chains contain antigen-binding sites and it is responsible for pathogen recognition. To achieve an efficient adaptive immune response, the immune system is equipped with the capability to generate an enormously large array of antibodies of different specificities, recognizing the different pathogens. In a cell, single proteins are typically coded by single genes. The number of antibodies that are required to fight off hundreds of thousands of different pathogens is so high, that one’s genome would have to accommodate hundreds of thousands of genes to make the various antigen receptors available for immune response. However, today it is clear that the genome of vertebrates contains just around 20,000 genes and thus not enough to give rise to the large repertoire of antibodies1. In B and T cells, an intricate system has evolved allowing the flexible generation of a virtually unlimited number of different antibodies, also referred to as immunoglobulins (Ig) or B-cell receptor (BCR) in B cells, and T-cell receptors (TCR) in T cells, or collectively, antigen receptors. In this system, the recombination of gene segments is responsible for the generation of genetic variation. This is achieved by the ordered recombination of so-called variable (V), diversity (D), and joining (J) gene segments, which together code for the antigen-binding portion of the antigen receptor. These genes are non-functional in their germline configuration but become functional following the process of somatic gene recombination, also called V(D)J recombination, where one of the V-genes, one of the D-genes (only for the Ig heavy chain) and one of the J-genes are ligated together and the intervening parts are excised from the genome. V(D)J recombination is initiated by recombination-activating gene 1 (RAG1) and recombination-activating gene 2 (RAG2) proteins, which introduce a double-stranded DNA breaks into the V(D)J genes – first, the heavy chain is recombined, followed by the recombination of the light chain. This newly assembled sequence encodes

18 | Chapter 2 for the hypervariable (antigen-binding) part of the B-cell antigen receptor or antibody. The BCR diversity in developing B cells is further increased by terminal deoxynucleotidyl transferase (TdT), which catalyzes the addition of nucleotides to the 3’ end of DNA during the formation of the V(D)J junction. V(D)J recombination is a site-specific recombination process that takes place only at BCR/T-cell receptor (TCR) gene segments and occurs only in developing lymphocytes in a lineage-specific manner2. The process of gene recombination, coupled with the induction of junctional diversity, exhibits remarkable complexity, facilitating the creation of an extensive array of antigen receptors boasting millions of specificities, all while requiring modest gene coding capacity. However, the induction of DNA breaks necessary for recombination and the generation of billions of antigen receptors over an organism’s lifespan poses a substantial threat to genomic integrity and requires stringent fidelity mechanisms. T cells during their development also undergo RAG1/2-mediated gene recombination to express highly variable surface T-cell receptor (TCR) receptors alpha (a) and beta (b) chains, present in the majority of T cells, or gamma (g) and delta (d) chains present only in a minor population of T cells3,4. Though the order of TCR loci recombination differs from the order of BCR recombination, the molecular basis of the gene recombination is the same in B and T cells. Considering the scope of this thesis, only B-cell-related gene recombination processes, and their regulation, will be discussed. T-cell development and TCR recombination have extensively been reviewed elsewhere5–7. Aberrant V(D)J recombination has been demonstrated to be an underlying cause of several lymphoid malignancies, and therefore, the basic regulatory mechanisms safeguarding the genome integrity during V(D)J recombination are of great interest 8,9. To understand the pathogenesis of lymphoid malignancies, a deep knowledge of the inner workings of our immune system is needed. The following chapters summarize the current knowledge of the physiological B-cell development and gene recombination. B-cell development, gene recombination, and B-cell activation B-cell development All lymphocytes develop from common lymphoid progenitors (CLPs) that are derived from hematopoietic stem cells (HSCs). The commitment of CLPs to either B- or T-lineage is coordinated by a set of transcription factors10. Commitment to the B-lineage is at first instance mediated by early B-cell factor (EBF) and E2A transcription factors, and subsequently by Paired Box 5 (Pax-5)11. Commitment to the T-cell lineage is mainly orchestrated by Neurogenic locus notch homolog protein 1 (Notch-1) and GATA-3 transcription factors12 Pro-B cells are the earliest committed B-cell precursors. The early stage of B-cell development is characterized by the sequential recombination of the Ig loci (Figure 1). At the end of this stage, RAG1 and RAG2 proteins are expressed for the first time and the first

2 Role of RAG1 and RAG2 in B-cell development | 19 recombination of Ig genes takes place at the immunoglobulin heavy chain locus (Igh). Typically, the DH to JH recombinations take place first, followed by a recombination of the upstream VH region to the already rearranged DJ segment. The successful recombination of Igh marks the transition to the pre-B cell stage10. The early pre-B cells express a pre-B cell antigen receptor (pre-BCR), consisting of the m-Igh chain and a surrogate light chain (SLC). The SLC is a heterodimer and consists of 2 proteins: λ5 and VpreB. The pre-BCR can oligomerize and trigger downstream signaling in the absence of a specific antigen, so-called tonic pre-BCR signaling, which is crucial for the survival of the pre-B cells at this stage13,14. Subsequently, pre-B cells may undergo 1 or 2 cell divisions and then proceed with the recombination of gene segments encoding for the immunoglobulin light chain (Igl). At this stage the RAG1 and RAG2 are re-expressed and at first, the immunoglobulin kappa chain (Igk) is recombined. If the pre-B cells fail to productively recombine the first and the second allele of Igk, and when they have exhausted all the recombination possibilities, then they proceed with recombination of the immunoglobulin lambda chain (Igl)15. The human Igk is located on chromosome 2 (the mouse Igk is located on chromosome 6), and the human Igl on chromosome 22 ( mouse Igl is located on chromosome 16) The light chain loci consist of VL and JL segments, and the constant region CL but lack D gene segments, in contrast to the Igh locus. Following a successful recombination, an IgM BCR is expressed on the cell surface. Once the recombination has successfully been performed, the RAG1/2 expression is in general suppressed throughout the mature stage of the B cells, though there are instances when RAG1/2 can be re-expressed at immature B cells and mediate secondary rearrangements in Ig loci, so-called receptor editing16. In the process of receptor editing, the functionally unresponsive or the self-reactive BCR may acquire a new specificity, and thus escape the apoptosis. Secondary rearrangements in mature peripheral B cells may also take place, the so-called receptor revision, which further diversifies the repertoire17. Each B-cell expresses a BCR of only one particular specificity. This is achieved by allowing only one functional recombined allele to be expressed. This phenomenon called allelic exclusion, guarantees specificity of immune responses and prevents auto-immune reactions. More details on the mechanism of allelic exclusion are discussed elsewhere in this review. After successful recombination, IgM-expressing immature B cells complete their maturation outside of the BM, they enter the circulation or migrate to the lymph nodes or spleen.

20 | Chapter 2 Figure 1. Schematic depiction of B-cell development in the bone marrow. B cells develop from common lymphoid progenitor (CLP). At this stage, both the heavy and the light chains are present in their germline configurations. Subsequently, at the pro-B cell stage, RAG1/2 is expressed (here depicted in red) and gene recombination at the Ig heavy chain locus is initiated. First, a D to J segment recombination takes place, followed by V to DJ recombination. Successful recombination at the early pre-B cell stage leads to the expression of pre-BCR, composed of the recombined heavy chain – m-Igh, and the surrogate light chain, composed of l5 and VpreB. At the late pre-B cell stage, the RAG1/2 recombinase is expressed for the second time and the recombination of Ig light chain genes is initiated. Following the successful Ig light chain recombination, the B-cell receptor (BCR/IgM) is expressed at the immature B-cell stage. Created with BioRender.com RAG1 and RAG2 proteins are essential for V(D)J recombination and normal B-cell development, this has been clearly illustrated in mice where the biallelic disruption of Rag1 and Rag2 genes resulted in a complete developmental block of B and T cells at their progenitor stage due to the inability to initiate V(D)J recombination18,19. Biallelic RAG1 or RAG2 null mutations in human result in severe combined immunodeficiency (SCID) presented with absent B- and T-cell lymphocytes20. Recent advances in genetic engineering allowed successful replacement of the RAG1 or RAG2 null mutations using CRISPR-Cas9 editing of the hematopoietic stem cells in mice and patients, which led to successful production of mature B and T cells with diverse antigen receptor repertoires21,22. Omenn syndrome (OS) presents as an additional clinical phenotype linked to hypomorphic mutations in RAG1 and RAG2, where only residual enzymatic RAG activity is observed. This condition is characterized by the presence of oligoclonal lymphocytes and infiltrating activated T cells, which cause damage to target tissues. Atypical SCID (AS) and delayed onset combined immunodeficiency with granulomas or autoimmunity (CID-G/AI) are some of the other clinical phenotypes associated with RAG1/2 mutations23.

2 Role of RAG1 and RAG2 in B-cell development | 21 Receptor editing and allelic exclusion Considering the vast diversity of antibody specificities, B cells must continuously ensure the specificities against pathogens, while averting autoreactivity. Consequently, B cells undergo multiple screening checkpoints during their development to assess their autoreactivity levels. The initial screening occurs following the differentiation of pro-B cells into pre-B cells. The antigen-independent tonic pre-BCR promotes the progression of pre-B cells with non-autoreactive pre-BCR to the next developmental stage. Lack of tonic preBCR signaling leads to apoptosis. However, several studies addressing the mechanism of central tolerance have uncovered that autoreactive B cells may undergo another round of Igl chain gene recombination as an attempt to “edit” the autoreactive pre-BCR or BCR and thus escape apoptosis. This ongoing gene recombination, known as receptor editing, may change receptor specificity and this contributes to repertoire diversification24,25. At the same time, receptor editing, and the ongoing exposure of the pre-B cells to RAG1/2-recombinase, may be a potential source of genomic instability. The concept of B-cell monospecificity has long been pivotal in explaining the targeted production of antibodies against specific pathogens. Referred to as the ‘one B-cell - one antibody’ rule, this paradigm finds substantial support in numerous experimental studies. At the genetic level, the monospecificity of B cells arises from the so-called allelic exclusion, mechanism ensuring that each B cell produces antibodies with a single specificity. B cells with multiple different specificities could lead to autoimmune reactions or ineffective immune responses26,27. It has been demonstrated that successful recombination on one allele initiated signaling suppressing bi-allelic recombination by terminating RAG1 and RAG2 expression as well as the accessibility of the Ig locus. Several mechanisms have been proposed to explain how simultaneous rearrangement of both alleles is prevented. The asynchronous recombination models rely on mechanisms regulating the accessibility. These models propose two main mechanisms: the probabilistic model and the instructive model. In the probabilistic model, the chromatin (in)accessibility leads to asynchronous recombination, as the inaccessible Ig chromatin/allele leads to slow and inefficient recombinations, limiting the frequency of recombination events to one allele per cell. The instructive model is based on the asynchronous replication timing of the two alleles. The early-replicating allele is recombined first, while the late-replicating allele is recombined only if the recombination attempt on the first allele failed27,28. The stochastic model suggests that while Ig recombination is highly efficient, it typically yields only one functional Ig allele per cell. This concept implies that Ig allelic exclusion occurs due to the low probability of rearranging an allele in the correct reading frame, resulting in a functional Ig chain, compared to the higher likelihood of generating a non-functional allele. Consequently, no coordination between the two Ig alleles is deemed necessary, rendering asynchrony of allelic recombination inconsequential in this model29. On the other hand, feedback inhibition models propose that the recombination process of Ig genes is inhibited by their

22 | Chapter 2 products or intermediates. In the classical feedback inhibition model, successful Ig gene rearrangements activate signals through pre-BCRs or B-cell receptors (BCRs), leading to the suppression of further allelic recombination30,31. This model explains the observed ratios of peripheral B cells with Igh and Igl loci in different configurations32. A more recent study proposes a mechanistic explanation based on the fact that only the mRNA transcripts from productively rearranged Igh allele are stable, whereas the non-productive Igh mRNA transcripts carry multiple stop-codons and are degraded by nonsense-mediated mRNA decay. Therefore, it was hypothesized that it is the stable mRNA transcript of productively rearranged Igh is responsible for the allelic exclusion33. However, a more detailed investigation of this mechanism provided compelling evidence showing that the Igh mRNA actually does not play a role in allelic exclusion, and that it is the Igh protein complex that enforces the allelic exclusion and drives B cell development34. In addition, the RAG2 C-terminus and ataxia-telangiectasia mutated (ATM) were shown to prevent the bi-allelic gene recombination35. In fact, the double-stranded DNA breaks (DSBs) arising from the RAG1/2 activity, which activate the DNA damage sensor ATM, have been proposed to trigger a negative feedback loop limiting the RAG expression. Inhibition or deletion of ATM resulted in bi-allelic recombinations and an increase in genomic instability in developing B cells36. From mature B-cell to plasma cell Immature B cells exit the bone marrow and travel to the spleen, where they complete their early development by differentiating into naïve, follicular, or marginal zone (MZ) B cells. In secondary lymphoid organs, mature B cells encounter antigens presented by antigen-presenting cells (APCs), such as dendritic cells. Upon recognition of specific antigens by their BCRs, B cells become activated and undergo clonal expansion. Activated B cells that receive appropriate signals from helper T cells differentiate into plasma cells. Plasma cells are specialized for antibody production and secrete large quantities of antibodies. A subset of activated B cells differentiates into memory B cells, which persist long-term and provide a rapid and robust secondary immune response upon re-exposure to the same antigen37. The first antibodies produced in a humoral immune response are always of the IgM isotype. However, mature B cells may undergo so-called class-switch recombination (CSR) changing the antibody isotype. There are 5 antibody isotypes: IgG (subclasses IgG1, IgG2, IgG3, and IgG4), IgA (subclasses IgA1 and IgA2), IgM, IgD, and IgE. Exons that encode for these antibody classes are named g (g1, g2, g3, and g4), a (a1 and a2), m, d, and e. Different isotypes have adapted to function in different body compartments, for instance, IgA is predominantly present in secretions such as saliva, digestive tract, or nasal secretions, while IgG is primarily present in the blood serum38. In addition to CSR, the affinity and other biological properties of the immunoglobulins can be further modified without changing their specificity in a process termed somatic hypermutation (SHM). Both CSR and SHM are

2 Role of RAG1 and RAG2 in B-cell development | 23 initiated by activation-induced cytidine deaminase (AID), an enzyme that is required for the execution of both mechanisms39–41. The exact molecular backgrounds of the CSR and SHM mechanisms are provided elsewhere42–44. AID has been shown to bind non-Ig loci and introduce collateral DNA damage throughout the genome of mature B cells 44–46. Thus, throughout their lifespan, B cells encounter numerous challenges to maintain genome integrity, highlighting the critical need for meticulous processes to safeguard genomic fidelity. RAG1, RAG2 and the molecular mechanisms of V(D)J recombination The evolution of RAG The human RAG1 and RAG2 genes are situated in close proximity on chromosome 11p (the mouse Rag1 and Rag2 are located on chromosome 2p), each containing one large exon, separated by only 8kb47. Such locus configuration is rather unusual and the lack of introns within the coding region is reminiscent of transposable elements. Indeed, the evolution studies of RAG1 and RAG2 structure and biochemical function indicate that RAGs originate from RAG-like (RAGL) transposable elements and evolved through so-called “molecular domestication” to their current form48. Transposons are considered fragments of “selfish DNA,” acting as molecular parasites capable of replicating within the host’s genome without offering any discernible benefit. These mobile genetic elements spread within genomes by excising or copying themselves from one location and inserting into another49,50. The first transposition event that led to the formation of the split antigen receptor gene occurred probably in early jawed vertebrate ancestors by invading the Ig-domain gene receptor exon by the RAGL_A family transposon. After this event, the RAGL_A transposon became a functional host protein, and its DNA-cleaving activity became essential for the expression of the split gene. A selective pressure caused an evolutionary adaptation of the RAGL_A in the way that it maintained its excision activity to allow the gene assembly, but at the same time suppressed the integrase activity of RAGL_A to limit its liability to genome integrity48,51,52. Such an evolutionary change, termed the “molecular transposon domestication”, might have occurred when the early jawed vertebrates acquired the arginine 848 mutation in RAG1L (R848). In most invertebrates, methionine would be found in this position (M848). Interestingly, the methionine to arginine mutation does not seem to increase the efficiency of DNA cleavage, but it strongly suppresses the transposition activity and improves the odds of maintaining the genome integrity during the gene recombination53,54. RAG1 and RAG2 proteins The human RAG1 contains 1043 amino acids (murine RAG1 contains 1040 amino acids) and is composed of 7 distinct structural domains. It contains a catalytic site that is responsible

24 | Chapter 2 for DNA cleavage55. The human and murine RAG2 is composed of only 527 residues and has no direct contribution to DNA cutting, but enhances the RAG1 DNA cleavage activity by hundreds of folds. The RAG2 is folded into a six-bladed β-propeller or Kelch-repeat structure56. The diverse domains in both RAGs can be categorized as either catalytically essential “core” domains or “non-core” domains, which are dispensable for the enzymatic activity but have a regulatory function57. For instance, non-core RAG2, consisting of approximately one-third of the full-length protein, mediates its nuclear localization, association with specific histone-modified chromatin, and facilitates the long-range recombination reactions58–60. The non-core RAG2 also contains a zinc-binding domain called plant homology domain (PHD), which coordinates with two zinc ions, and it specifically binds the histone H3 trimethylated on Lysine 4 (H3K4me3). Mutations in this region lead to severe defects in V(D)J recombination activity and failure of RAG2-PHD to bind H3K4me3 and has been linked to Omenn syndrome61. Non-core RAG1 increases the efficiency and fidelity of V(D)J recombination62,63. An important feature in the N-terminal of non-core RAG1 is the zinc-binding dimerization domain (ZDD). ZDD contains two sub-domains, the C2H2 zinc finger and the Really Interesting New Gene (RING) finger, which act as an E3 ubiquitin ligase mediating (auto)ubiquitination of Histone H3. Point mutations in this domain result in decreased V(D)J recombination activity64,65. Structural studies in mice and zebrafish revealed that RAG1 and RAG2 interact and form together a Y-shaped heterodimer of 230kDa size, the RAG complex is composed of two core subunits, forming a heterotetramer that holds the two paired DNA strands, where RAG1 catalyzes the DNA nicking step. RAG2 is in particular responsible for stabilizing the protein-protein interaction between the two halves of the RAG tetramer, and for conferring the specificity to the interaction of RAG1 with the DNA substrate66,67. Molecular mechanism of V(D)J recombination V(D)J recombination is initiated by the RAG1/2 endonuclease heterotetramer,56 introducing a single-stranded DNA nick in the RSSs flanking the two participating coding sequences (Figure 2A and 2B) The RSSs contain conserved palindromic heptamer (CACAGTG) and an AT-rich nonamer (ACAAAAACC), separated by 12 or 23 nucleotides of a less-conserved spacer sequence. The nucleotide sequence of the RSS may vary, however, the gene recombination can (normally) occur only between genes flanked by the RSS segments, and it is only efficient when it takes place between RSSs with different space lengths, this is also known as the “12/23 rule”. In the recombination of T-cell receptor beta and delta loci, additional spatial restrictions apply, so-called “beyond 12/23 restriction”68,69. The RSSs are limiting sequence structures required for V(D)J recombination; recombination on artificial substrates can be successfully achieved in vitro and in vivo when the substrates are flanked by canonical RSSs70,71. The orientation of RSS also determines if the joining of the coding segments proceeds by inversion or by deletion of the intervening sequence72.

2 Role of RAG1 and RAG2 in B-cell development | 25 Figure 2. (A) Schematic representation of DNA cleavage process during V(D)J recombination. RAG1 and RAG2, assisted by high-mobility group proteins (HMGB), binds the V(D)J segments (here represented by the yellow/purple box) at recombination signal sequences (RSS) (represented by the yellow/purple triangles). Next, a synapse is formed and nicked by the RAG1/2 at the RSS. The cleaved DNA is then repaired in the process of non-homologous end joining (NHEJ), leading to the formation of a coding joint (the recombines loci) and a formation of a signal joint, also called the excised signal circle “ESC” (the intervening loci) (B) A more detailed schematic representation of the molecular mechanism of V(D)J recombination. RAG1/2 recombinase introduces a nick on one DNA strand. Subsequently, the free -OH group, created by the nick, attacks the other DNA strand by transesterification, leading to the formation of a break at the other DNA strand, which ultimately creates a double-stranded DNA break (DSB). A hairpin is formed at the broken end, and the Ku70 and Ku80 bind and stabilize the loose DNA end, they also attract other proteins involved in the DNA repair process. To resolve the DSB, first, the hairpin is opened by Artemis, which in the presence of DNA-dependent protein kinase catalytic subunit (DNA-PKcs) acts as an endonuclease. Upon hairpin opening, the shorter DNA strand is extended by the addition of palindromic nucleotides (P-nucleotides) at the coding end, and the terminal deoxynucleotidyl transferase (TdT) further diversifies the junction by catalyzing the addition of non-templated nucleotides (N-nucleotides). The ends are filled by the Polymerase X family of polymerases (Pol μ and Pol λ) and finally, the broken DNA is sealed by Ligase IV/XRCC4. Created with BioRender.com

26 | Chapter 2 More detailed studies of the initiation of V(D)J recombination revealed that DNA melting of the conserved RSS heptamer proceeds the nicking and suggested that the conservation of the heptamer is determined by its ability to unwind at CAC/GTG sequences73. Subsequently, trans-esterification takes place during which the nicked 3’OH of the coding strand invades the opposite DNA strand, creating a DSB, which contains a closed hairpin at the coding end and a blunt 5’RSS end (signal end). RAG1 and RAG2 are supported by high-mobility group proteins belonging to the HMG-box family (HMGB1 and HMGB2), which facilitate the association of two signal ends. The HMG proteins interact with the nonamer-binding domain of RAG1 even in the absence of DNA, amplifying its natural DNA-binding activity74. Following the DNA cutting, Ku70 (also known as X-ray repair cross-complementing group 5 or XRCC5) and Ku80 (also known as X-ray repair cross-complementing group 6 or XRCC6) heterodimerize and bind the broken DNA ends and attract DNA-dependent protein kinase catalytic subunit (DNA-PKcs), which controls the interaction of the broken DNA ends. The Ku70/Ku80 heterodimer also attracts other factors containing Ku-binding motifs75,76. Even though they form a heterodimer, striking differences in their mutant phenotypes have been observed, while the ku80-/- mice were reported to show early aging signs without the increased incidence of cancer, the ku70-/- mice exhibited defective B-cell maturation and high incidence of thymic lymphomas77. Silencing of Ku70 in the Jurkat T-cell leukemia cell line resulted in accumulation of DSBs, cell cycle arrest, and increased apoptosis as a consequence of the impairment of the DNA repair pathways by the Ku70 deficiency78. Next, from the four broken DNA ends two are covalently sealed forming a hairpin (termed “coding end”), and the other two are blunt DNA ends (termed “signal ends”). For this reaction no external energy is required, the necessary energy is derived from the DNA breakage. The binding of RAG1/2 complex to the DNA does not form a covalent complex and while the nicking occurs within minutes, the hairpinning might take several hours79. Subsequently, the hairpins must first be opened in order to proceed with the joining and repair of the broken DNA. A characteristic feature of V(D)J recombination is the asymmetric processing of the signal and coding end. The blunt-ended signal ends can directly be ligated with almost no processing, while the coding ends must first undergo further processing, such as small deletions or small insertions80,81. In the process of V(D)J recombination, Artemis is a nuclease with an indispensable role in resolving and repairing the DSBs. Artemis exhibits an inherent 5’-3’exonuclease activity while upon an association with DNA-PKcs, Artemis acts as a 5’-3’endonuclease. The hairpin is opened asymmetrically on the coding end, creating a shorter and a longer strand. The shorter strand is extended by the addition of nucleotides complementary to the longer strand, giving rise to the insertion of palindromic nucleotides (P-nucleotides) at the coding end. These are never observed at the signal end. In addition, the single-strand extensions increase the chances of loss of nucleotides from the coding end, thereby further increasing the diversity of the

2 Role of RAG1 and RAG2 in B-cell development | 27 antigen-binding sites82. In addition, the terminal deoxynucleotidyl transferase (TdT) further augments the diversity of the junctional sequences by catalyzing the addition of the non-templated nucleotides (N-nucleotides) to the coding ends. This process of junctional diversification can potentially expand the diversity from around 106 up to 1011 possible combinations83. Typically, the ends are filled in by the Pol X family of polymerases, namely Pol μ and Pol λ. Defects in these polymerases in mice resulted in shorter Igh and Igl D to J and V to DJ junctions. Next, the Ligase IV/XRCC4 complex ligates the processed ends. This DNA repair pathway is known as classical non-homologous end joining (c-NHEJ) where Ku70, Ku80, Artemis, XRCC4, and DNA Ligase IV are considered to be indispensable for this process, conserved throughout the evolution in different cell types84. Finally, the DNA between the two recombining segments is removed and covalently sealed at the signal ends, forming a so-called excised signal circle (ESC). It has been estimated that the production of every functional antigen receptor gene can generate up to 10 ESCs, depending on the level of non-productive recombinations. ESCs are non-replicative elements that are likely to be lost or diluted in the subsequent cell divisions85,86. Aberrant V(D)J recombination and aberrant RAG mistargeting Aberrant V(D)J recombination Next to the typical outcome of the V(D)J recombination – the formation of the coding and the signal ends, alternative outcomes have been reported. For instance, formation of so-called “hybrid joints” (HJ) when the coding end of one exon is joined to the signal end of another cleaved exon. Such events can occur in a small percentage of murine and human B cells. HJs do not contribute to the repertoire diversity and their role in oncogenic transformation remains inconclusive87. Increased formation of hybrid joints was observed in mice harboring inactivating mutation in Nijmegen breakage syndrome 1 protein (NBS1). NBS1 is part of the MRE11-RAD50-NBS1 complex (MRN), which plays a crucial role in DNA repair. The NBS1 mutation was shown to promote DNA repair through the alternative NHEJ pathway, known for its error-prone nature, often leading to small insertions or deletions88. In mice overexpressing coreRAG1 (cRAG1) and coreRAG2 (cRAG2) proteins, extremely high frequencies of HJ formation were observed as compared to cells derived from mice overexpressing full-length RAG1 and RAG2, suggesting that the non-core RAG regions suppress HJ formation under physiological conditions. In addition, leukemia that the mice expressing cRAG1 and cRAG2 developed was more aggressive and showed more genomic instability, as judged by the percentage of gH2AX-positive cells, which is a marker for DSBs89. However, the formation of HJ in endogenous wildtype/unmanipulated context seems to be rather rare. In other cases, the original pairs of coding and signal, cleaved by RAGs, are rejoined leading to a formation of “open-shut joints”. These are rather difficult to detect if there are

28 | Chapter 2 no further modifications of the joint90. An in vitro cell-free RAG activity assay showed that the truncated but catalytically active forms of RAG easily catalyzed the formation of hybrid joints and open-shut joints suggesting that the full-length forms repress their formation. The in vitro transposition activity of full-length RAG2 was dramatically reduced as opposed to the full-length RAG1 transposition activity91. Several studies have shown that the ESCs do not seem to be as inert as initially thought, and in fact, in various in vitro assays they display the capacity to re-integrate elsewhere in the genome. However, it is supposed that the in vivo threat to the integrity of the genome caused by RAG-mediated transposition is rather limited. Only a few in vivo RAG-mediated transposition events have been described to date, and the evidence linking leukemia or lymphoma cases to RAG-mediated transposition events remains inconclusive86,92–95. The RAG/ESC complex could, so far only theoretically, catalyse the formation of DSBs as it still contains two RSSs (Figure 2A). It has been observed that upon cleavage of the ESC is opened at the RSS but the RAG1/2, which is still bound to the ESC, is able to introduce DSBs at the next RSS, a process termed as “cut-and-run”. Recently, a next-generation genome-wide sequencing of the leukemic B cells revealed similarities between the “cut-andrun” breakpoints and the breakpoints observed in B-cell leukemia that harbor the ETV6/ RUNX1 chromosomal translocation96,97. These studies argue that the lack of evidence is caused by the limitations of the previously used techniques and suggest that the advances in the next-generation sequencing techniques of the whole genome could provide a much better resolution of the genomic events, and thus in the future deliver the missing evidence of RAG-mediated ESCs re-integration in lymphoid malignancies98,99. RAG mistargeting Sequences that resemble RSS, termed cryptic RSS (cRSS), were shown to be very abundant in the vertebrate genome, with a remarkable estimate of one cRSS per each 600bp100. A study employing genome-wide chromatin immunoprecipitation (ChIP) of RAG1 and RAG in mouse pre-B cells, followed by next-generation sequencing (ChIP-seq), clearly demonstrated that RAG1 and RAG2 binding in the genome is not solely limited to the Ig loci, as around 3400 RAG1 and around 18300 RAG2 binding sites were identified throughout the genome of mouse pre-B cells. However, the presence of RSS alone seems to be a poor indicator of RAG1-binding sites; the heptamer and cRSS sequences were even found to be depleted from the RAG1-binding sites. Also, in this study no translocations were detected of the selected examples of genes with high content of cRSS in their proximity, concluding that RAG1/2 binding outside of the Ig/Tcr loci only rarely results in translocation events71. On the other hand, the RAG2 binding throughout the lymphocyte genome coincided significantly with regions containing high levels of histone H3 trimethylated at lysine 4 (H3K4me3)101,102, which was not entirely surprising considering that the PHD finger of RAG2 was shown to specifically recognize and bind H3K4me3 in mammalian cells. Mutations

2 Role of RAG1 and RAG2 in B-cell development | 29 that abrogate RAG2’s ability to bind H3K4me3 severely impaired V(D)J recombination in vivo61. Furthermore, the junction analysis of mice bearing RAG2 with truncated C-terminus showed the formation of genomic lesions in the proximity of cRSSs, including several oncogenes103, and similar lesions have also been observed in human pre-B cells derived from B-ALL patients104. Next to the lesions observed in the proximity of the individual cRSS, the genomic events can also take place between pairs of cRSS. This may result in a chromosome translocation as seen in several cases of human pre-T cells derived from patients suffering from T-cell acute lymphoblastic leukemia (T-ALL), bearing recombination between TCR gene segments and the Ikaros locus (Ikzf1)105, neurogenic locus notch homolog protein 1 (Notch1)106, phosphatase and tensin homolog (PTEN)107 or stem cell leukemia (SCL)/ SCL interrupting locus (SIL)108,109. Besides being a sequence-specific recombinase, RAG1/2 has also been shown to recognize specific DNA structures and cleave DNA even without the presence of RSS motifs, thus acting as a structure-specific nuclease, cleaving, for instance, heterologous loops of G-quadruplexes110,111. For example, RAG1/2 shows 3’-flap endonuclease activity that is able to remove single-strand (ss) extensions from branched DNA112. In follicular lymphoma, non-B DNA structures were also identified around the BCL2 major breakpoint region (Mbr) of the t(14;18) translocation, where the BCL2 gene is juxtaposed to the Igh locus. RAG complex was shown to bind and nick at non-B DNA structures in the proximity of BCL2 Mbr, resulting in double-strand breaks in vitro, thus demonstrating the transposase activity of RAG1/2113. Simple repeat sequences have also been shown to assume non-B DNA structures and become a RAG1/2 target. For instance, CA-repeats were shown to function as a type of cryptic RSS (cRSS) as RAG1/2 was able to cleave such structures114. In addition, the GC-rich regions are also able to adopt structures such as hairpins, cruciform or triple-stranded DNA, and the GC-rich motif 5’-GCCGCCGGGCG-3’ was identified as RAG1/2 transposition hotspot115. In our study (manuscript under consideration) we identified RAG1/2-dependent DSBs on genome-wide scale by using NBS1 ChIP-seq. Though the RAG1/2 DSBs were clearly enriched at Ig light chain regions, where they associated with RSS motifs as expected, the majority of the RAG-dependent DNA breaks were found outside of the Ig loci and showed no appreciable association with cRSSs. Interestingly, simple repeat sequences such as GA and CA repeats, but also GC-rich motifs were found to be enriched in the 500-1000bp proximity of the RAG1/2-dependent DSBs, further underscoring the propensity of the repeat regions to become RAG1/2 target. Next to aberrant RAG1/2 targeting in developing lymphocytes, the aberrant or persistent expression of RAG1/2 also represents a significant threat to genome integrity, various aspects of which are outlined in the next chapters.

RkJQdWJsaXNoZXIy MTk4NDMw