Lisanne de Koster

changing the paradigm managementof indeterminate thyroid nodules LIsanne de Koster

Management of indeterminate thyroid nodules: changing the paradigm Elizabeth Janna de Koster

Cover design and thesis layout Lisanne de Koster ISBN 978-94-6506-849-7 Printed by Ridderprint, www.ridderprint.nl Copyright © Lisanne de Koster All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without prior permission in writing from the author. The copywright of the articles has been transferred to the respective journals.

Management of indeterminate thyroid nodules: changing the paradigm Proefschrift ter verkrijging van de graad van doctor aan de Universiteit Leiden, op gezag van rector magnificus prof. dr. ir. H. Bijl, volgens besluit van het college voor promoties te verdedigen op donderdag 6 maart 2025 klokke 13.00 uur door Elizabeth Janna de Koster geboren te Goes in 1988

Promotores prof. dr. L.F. de Geus-Oei prof. dr. W.J.G. Oyen Radboudumc; Rijnstate; Humanitas University, Milaan Copromotor dr. D. Vriens Radboudumc Leden promotiecommissie prof. dr. N.M. Appelman-Dijkstra prof. dr. M.R. Vriens UMCU prof. dr. E.F.I. Comans HaaglandenMC; AUMC dr. J.W.A. Oosterhuis HaaglandenMC dr. K. van der Tuin UMCG

6 Table of contents Part I. Prologue Chapter 1 General introduction and outline of this thesis 11 Chapter 2 Diagnostic utility of molecular and imaging biomarkers in cytological indeterminate thyroid nodules. 21 Chapter 2 supplementary data: Systematic review & meta-analysis. 75 Chapter 3 Non-invasive imaging biomarkers of thyroid nodules with indeterminate cytology. 265 Part II. Efficacy of [18F]FDG-PET/CT in indeterminate thyroid nodules Chapter 4 [18F]FDG-PET/CT to prevent futile surgery in indeterminate thyroid nodules: a blinded, randomised controlled multicentre trial. 295 Chapter 5 Quantitative classification and radiomics of [18F]FDG-PET/CT in indeterminate thyroid nodules. 337 Chapter 6 [18F]FDG-PET/CT in indeterminate thyroid nodules: cost-utility analysis alongside a randomised controlled trial. 361 Chapter 7 Health-related quality of life following [18F]FDG-PET/CT for cytological indeterminate thyroid nodules. 385 Chapter 8 [18F]FDG uptake and expression of immunohistochemical markers related to glycolysis, hypoxia, and proliferation in indeterminate thyroid nodules. 411 Chapter 9 Preoperative stratification of cytologically indeterminate thyroid nodules by [18F]FDG-PET: can Orpheus bring back Eurydice? 429

7 Chapter 10 What is the role of functional imaging and isotopic treatment? 435 Part III. Efficacy of molecular diagnostics in indeterminate thyroid nodules Chapter 11 A clinically applicable molecular classification of Hürthle cell thyroid nodules 441 Chapter 12 Molecular diagnostics and [18F]FDG-PET/CT in indeterminate thyroid nodules: complementing techniques or waste of valuable resources? 469 Part IV. Epilogue Chapter 13 General discussion 495 Part V. Appendices Summary 529 Samenvatting 539 Glossary 551 References 557 Curriculum vitae 597 List of publications 601 Acknowledgements 605

part I Prologue

chapter 1 General introduction and outline of this thesis

12 chapter 1 Introduction Incidence and risk of malignancy of thyroid nodules The growing use of progressively sensitive imaging techniques has resulted in an increased detection of thyroid nodules. Whereas the prevalence of palpable thyroid nodules is merely 1% in males and 5% in females in iodine-sufficient countries, the lifetime prevalence of thyroid nodules detected during ultrasound examination or on autopsy studies ranges between 34% to 66%. Less than 10% of these nodules are malignant; most are benign, asymptomatic, and do not require treatment [1-5]. Yet, the increased detection of thyroid nodules has resulted in a rise in thyroid surgeries and higher incidence of differentiated thyroid carcinoma [6-10]. In the United States, a tripling of the incidence of thyroid carcinoma was observed from 4.5 to 14.4 per 100.000 between 1974 and 2013 [11]. In the Netherlands, the incidence of thyroid carcinoma more than doubled from 1.65 to 3.34 per 100.000 (Revised European Standardized Rate) in males and from 3.51 to 6.73 per 100.00 in females between 1990 and 2022 [9]. The clinical relevance of this raised thyroid cancer incidence is questionable, as these were oftentimes indolent papillary thyroid microcarcinoma (i.e., subclinical disease) [5-7, 10-12]. Moreover, despite the increased diagnosis and treatment, the mortality rates for papillary thyroid carcinoma have not decreased [10-12]. These epidemiological observations feed ongoing discussions about the overdiagnosis and overtreatment of thyroid nodules and endorse the need for cost-effective, risk-adapted and de-escalating management strategies [7, 12-15]. The Bethesda System for Reporting Thyroid Cytopathology To aid the concise and unambiguous reporting of thyroid cytology using uniform terminology, the Bethesda System for Reporting Thyroid Cytopathology was developed and first published in 2009 [16]. Since then, the Bethesda System has been updated several times to meet recent developments in thyroid pathology, including the latest guidelines for the management of thyroid nodules, the latest WHO classification of endocrine tumours, the reclassification of the ‘malignant’ encapsulated follicular variant of papillary thyroid carcinoma to the benign diagnosis of non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), and corrected estimates of the risk of malignancy per diagnostic category following the standardized introduction of the Bethesda System [17-24]. The most recent update of the Bethesda System was published in 2023 [23]. The Bethesda System incorporates six diagnostic categories with an increasing risk of malignancy (Table 1). In most literature as well as in the current thesis, Bethesda categories III (atypia of undetermined significance) and IV (follicular neoplasm) are defined as indeterminate cytology. These categories consist of follicular-patterned lesions that are difficult to diagnose on fine needle

13 General introduction and outline of this thesis 1 Table 1. The Bethesda System for Reporting Thyroid Cytopathology [16, 18, 27] 2009 Bethesda System 2017 Bethesda System 2023 Bethesda System Usual management Category ROM Category ROM if NIFTP ≠ CA ROM if NIFTP = CA Category ROM if NIFTP ≠ CA, mean % (range) ROM if NIFTP = CA, mean % I Nondiagnostic or unsatisfactory 1-4% Nondiagnostic or unsatisfactory 5-10% 5-10% Nondiagnostic 13% (5-20%) 12% Repeat FNAC with ultrasound guidance II Benign 0-3% Benign 0-3% 0-3% Benign 4% (2-7%)3 2% Clinical and ultrasound1 follow-up III Atypia of Undetermined Significance or Follicular Lesion of Undetermined Significance (AUS/FLUS) ~5-15% Atypia of Undetermined Significance or Follicular Lesion of Undetermined Significance (AUS/FLUS)4 6-18% ~10-30% Atypia of undetermined significance (AUS)4 22% (13-30%) 16% Repeat FNAC, MD1, diagnostic lobectomy1, or surveillance1 IV Follicular neoplasm or suspicious for a follicular neoplasm, specify if Hürthle cell type 15-30% Follicular neoplasm or suspicious for a follicular neoplasm, specify if Hürthle cell type 10-40% 25-40% Follicular neoplasm, specify if oncocytic type 30% (23-34%) 23% MD1, diagnostic lobectomy V Suspicious for malignancy 60-75% Suspicious for malignancy 45-60% 50-75% Suspicious for malignancy 74% (67-83%) 65% MD2, lobectomy or (near-)total thyroidectomy VI Malignant 97-99% Malignant 94-96% 97-99% Malignant 97% (97-100%) 94% Lobectomy1 or (near-)total thyroidectomy CA, carcinoma; FNAC, fine needle aspiration cytology; MD, molecular diagnostics; NIFTP, non-invasive follicular thyroid neoplasm with papillary-like nuclear features; ROM, risk of malignancy. 1: Introduced with the 2017 Bethesda System [18]. 2: Introduced with the 2023 Bethesda System [27]. 3: This ROM estimate is based on follow-up of surgically resected nodules. As most thyroid nodules with benign cytology do not undergo surgical excision, this ROM is likely skewed by selection bias. Based on long-term follow-up studies, the best ROM estimate for benign cytology is ~1% to 2% [23]. 4: This category can be further subclassified according to the type of atypia. The 2017 Bethesda System recognized cytologic atypia, architectural atypia, cytology and architectural atypia, Hürthle cell AUS/FLUS, and atypia not otherwise specified [18]. The 2023 Bethesda System recognizes nuclear atypia (previously cytologic atypia) and non-nuclear atypia. The ROM appears to be higher for AUS cytology with nuclear atypia [27].

14 chapter 1 aspiration cytology (FNAC) alone, as cytology has limited insight in the tissue structure, including the assessment of capsular and/or vascular invasion that distinguish follicular adenoma (FA) from follicular thyroid carcinoma (FTC) [25]. Indeterminate cytology makes up approximately a quarter of all FNAC results [16, 18, 26]. To subsequently obtain a definitive diagnosis, diagnostic thyroid lobectomy would be required, resulting in a histopathological diagnosis of thyroid carcinoma in approximately 25% of indeterminate nodules [16, 18, 26, 27]. In other words, approximately 75% of these patients would undergo diagnostic thyroid surgery for a benign nodule: futile surgery from an oncological perspective, with associated costs, morbidity, and unwarranted risks of surgical complications [28, 29]. Improving the diagnostic workup Additional diagnostics should be considered to aid the malignancy risk stratification and hopefully obtain a more definitive diagnosis before proceeding to diagnostic surgery. That way, unbeneficial diagnostic surgery for benign nodules can be avoided when malignancy can accurately be ruled out. When malignancy is confirmed or highly suspicious, depending on other clinical and pathological characteristics, a (sub-)total thyroidectomy can be considered at once instead of two-step surgery that starts with a diagnostic hemithyroidectomy [14]. The 2015 American Thyroid Association guidelines proposed that an ideal rule-out diagnostic for thyroid carcinoma should have a negative predictive value similar to a benign cytological diagnosis (~96.3%) and the ideal rule-in test a positive predictive value that is at least similar to a malignant cytological diagnosis (~98.6%) [17]. Diagnostic accuracy versus clinical utility In times where shared-decision making and cost-effectiveness are increasingly important in daily clinical practice, both physicians and patients desire more from a diagnostic test than merely well-validated diagnostic accuracy and high rule-in and/or rule-out capacity. They are additionally interested in actual changes in outcomes that matter to patients. Therefore, instead of simply focusing on the highest sensitivity and/or specificity, a diagnostic test is better appreciated by end points such as desired minimal rates of accurately prevented unbeneficial surgeries or accurately managed carcinomas. The extent to which the use of a diagnostic test improves health outcomes relative to the current best alternative, is defined as clinical utility [30]. Similar to the four well-known phases in clinical drug research, several comparable hierarchical systems have previously been proposed to evaluate diagnostic tests. The most well-know is the six-step Fryback-Thornbury hierarchy, which was originally presented in 1991 and designed for the evaluation of imaging techniques (Figure 1) [31, 32]. The evaluation of diagnostic accuracy occupies a central position in this and other systems, as the accurate identification of patients with the index

15 General introduction and outline of this thesis 1 disease is indispensable for clinical utility [30, 31]. Utility of the diagnostic test is evaluated in the later stages, starting with the more subjective ‘diagnostic thinking efficacy’ (level 3), that is, how do physicians appreciate the information of the diagnostic test, and ‘therapeutic efficacy’ (level 4), that is, do physicians think that the diagnostic test changed their decision-making and planned patient management [30, 32]. These two phases are criticized for their subjectivity and lack of validity, as intended behaviour may not reflect actual behaviour [30]. In the Fryback-Thornbury hierarchy, clinical utility is considered level 5, ‘Patient outcome efficacy’. This may include the rate of accurately prevented unbeneficial diagnostic thyroid surgeries for benign thyroid nodules as well as changes in health-related quality of life (HRQoL) [30-32]. Clinical utility is best evaluated using a randomized study design [30]. The final level 6 is ‘Societal efficacy’ and includes the assessment of the use of resources and medical benefits on a societal level as opposed to the patients’ individual risks and benefits. This includes cost-effectiveness analyses, in which utility is most often defined as the number of quality-adjusted life years (QALYs) gained [30-32]. Figure 1. The Fryback-Thornbury hierarchy. The diagnostic randomized controlled trial Diagnostic randomized controlled trials (RCTs) are defined as randomized comparisons of two diagnostic interventions (i.e., experimental versus standard) that measure the impact of the experimental diagnostic intervention on health outcomes as compared to the standard diagnostic intervention [33, 34]. Whereas cohort studies provide the relative diagnostic accuracy of an additional test as compared to the reference standard, diagnostic RCTs may additionally inform on the clinically important consequences of that diagnostic accuracy [33].

16 chapter 1 Diagnostic RCTs are also called ‘test-treatment trials’: patients are randomized to the experimental or standard diagnostic strategy, but health outcomes are only measured after the patients have also undergone the subsequent treatment, which is typically predefined in the trial protocol. As such, entire test-treatment strategies are actually evaluated rather than merely diagnostic strategies, and the effects of the diagnostic strategy will not just depend on the test itself but also on the effectiveness of subsequent management [30, 35]. Various designs are possible for diagnostic RCTs. Trials may assess a single test (i.e., comparing the experimental diagnostic intervention to standard patient management, ‘no test’) or compare multiple (experimental) diagnostics to each other. Next, the timing of the randomization may differ, that is, either prior to the diagnostic intervention or prior to the treatment based on the result of the diagnostic. These designs all have pros and cons with regard to methodological complexity, feasibility, efficacy, and validity (i.e., the degree to which the effect of the test itself is evaluated as compared to evaluation of the consequent treatment), among others depending on the desired outcome measures and the type, capacity, and costs of the diagnostic intervention(s) [34]. The most suitable trial design may therefore vary for each research aim. The EfFECTS trial This thesis is structured around the Efficacy of [18F]FDG-PET in Evaluation of Cytological indeterminate Thyroid nodules prior to Surgery (EfFECTS) trial. The EfFECTS trial was a triple blinded, randomised controlled multicentre trial that investigated the implementation of positron emission tomography/ computed tomography using 2-[18F]fluoro-2-deoxy-D-glucose ([18F]FDG-PET/CT) as a rule-out test in the diagnostic workup of cytologically indeterminate (Bethesda III and IV) thyroid nodules, aiming to reduce the number of futile diagnostic surgeries for benign nodules. [18F]FDG-PET/CT visualizes metabolic activity in tissues and can be used in the diagnosis, staging and therapeutic response monitoring of many malignancies. It utilizes the basic principle that the metabolism of (malignant) neoplasms and inflammation is upregulated as compared to that of normal tissues, with up to 200 times higher glycolytic rates and preferential lactic acid fermentation, even in abundance of oxygen (the Warburg effect) [36]. The EfFECTS trial was founded on previous work by our group: a 2006 prospective study that demonstrated that [18F]FDG-PET/CT accurately ruled out malignancy with 100% sensitivity in 44 patients with a thyroid nodule with inconclusive cytology, a 2011 meta-analysis of six earlier nonrandomized studies that demonstrated 95% sensitivity for [18F]FDG-PET/CT in indeterminate thyroid nodules, increasing to 100% for nodules above 15 mm in diameter, and a 2014 cost-effectiveness analysis based on a Markov decision model with a 5-year horizon that showed that [18F]FDG-PET/CTdriven management may cost-effectively reduce the fraction of futile surgeries from ~ 75% to ~ 40%, with an expected reduction in direct healthcare costs while preserving HRQoL [29, 37, 38]. Following these and other studies that confirmed the safety (i.e., high negative predictive value) of [18F]FDG-

17 General introduction and outline of this thesis 1 PET/CT in indeterminate thyroid nodules, international guidelines acknowledged its potential but stopped short of recommending its routine use, because randomized controlled trials to validate the impact of [18F]FDG-PET/CT on improved patient outcomes were lacking [17, 39, 40]. And thus, the EfFECTS trial was designed. Aim Confirming the position of [18F]FDG-PET/CT in the diagnostic workup of cytologically indeterminate thyroid nodules is central to this thesis. To achieve this, all efficacy levels that are involved in the implementation of a new imaging strategy are investigated. The ultimate aim of this thesis is to improve the diagnostic workup of indeterminate thyroid nodules and reduce unbeneficial patient management, including unbeneficial diagnostic tests as well as futile diagnostic surgical procedures, to benefit the individual patient as well as our health care system on a societal level. Outline This thesis consists of three parts. Part I provides an introduction to additional diagnostics for cytologically indeterminate thyroid nodules. Chapter 2 provides a comprehensive, systematic overview of the literature on additional diagnostic tests for indeterminate thyroid nodules. This review discusses the complete range of available molecular and imaging biomarkers, from conventional tests such as ultrasound and immunocytochemistry, to state-of-the-art techniques including [18F]FDG-PET/CT and molecular diagnostics. Besides discussing the ability of each test to distinguish between malignant and benign indeterminate nodules in a pre-operative setting, we also zoom in on clinical validation and utility, cost-effectiveness and availability of these techniques, where appropriate. Chapter 3 zooms in further on imaging biomarkers and reviews non-invasive diagnostic imaging techniques for indeterminate thyroid nodules, from conventional to artificialintelligence-based imaging. Part II investigates the efficacy of [18F]FDG-PET/CT in indeterminate thyroid nodules. In Chapter 4, the main results of the EfFECTS trial are presented. In a Dutch multicentre setting, the safety and impact of an [18F]FDG-PET/CT-driven diagnostic workup for Bethesda III/IV thyroid nodules is assessed in a randomized comparison to diagnostic surgery without additional preoperative diagnostics. The primary objective of this study is to accurately reduce the rate of unbeneficial patient management, i.e., avoid futile diagnostic surgery for benign nodules and wrongful avoid active surveillance for malignant and borderline nodules that do require surgical resection. Secondary objectives include the influence of [18F]FDG-PET/CT-driven management on the surgical complication rate, general HRQoL using the EuroQol 5-dimension 5-level (EQ-5D-5L) questionnaire, societal costs, diagnostic

18 chapter 1 and therapeutic consequences of incidental PET/CT findings, and to assess the implementability of [18F]FDG-PET/CT. Whereas a visual analysis of the [18F]FDG-PET/CT images is applied in Chapter 4, Chapter 5 discusses whether quantitative [18F]FDG-PET/CT assessment improves the preoperative differentiation of indeterminate thyroid nodules. These assessments include receiver operating characteristic curve analysis and threshold analysis using the standardized uptake value (SUV) and SUV ratios, as well as radiomic analysis of [18F]FDG positive nodules, with a specific focus on separate evaluation of nodules with non-oncocytic and oncocytic cytology. Chapter 6 describes an extensive cost-utility analysis of an [18F]FDG-PET/CT-driven workup as compared to diagnostic surgery for indeterminate thyroid nodules. The observed 1-year data from the EfFECTS trial are first used to calculate the 1-year societal costs and QALYs. Subsequently, these data are extrapolated using a 12-health state Markov Model to estimate lifelong cost-effectiveness. Univariate sensitivity analyses are performed to evaluate the impact of individual model probabilities, costs, and utilities. In Chapter 7, the HRQoL of an [18F]FDG-PET/CT-driven diagnostic workup is studied in greater detail, using the EQ-5D-5L, the RAND 36-item Health Survey v2.0, and the Thyroid Patient-Reported Outcome (ThyPRO) questionnaire. In order to most genuinely distinguish the impact of the [18F] FDG-PET/CT scan, its consequent surgical or non-surgical management, and a benign or malignant histopathological diagnosis, patients were categorised into three groups for this study: patients who underwent diagnostic surgery and had (1) benign or (2) malignant histopathology, and (3) patients who had an [18F]FDG-negative nodule and underwent active surveillance. As the last original chapter of this part of the thesis, Chapter 8 explores the association between [18F]FDG uptake and the quantitative expression of several immunohistochemical markers related to glucose metabolism, hypoxia, and cell proliferation. This study aims to expand the understanding of the metabolic changes in benign and malignant thyroid nodules of indeterminate cytology, and better understand why part of the benign nodules are [18F]FDG-positive while others are not. Chapter 9 discusses how the results of the EfFECTS trial should be interpreted in light of previous literature, and considers the trials’ limitations. It also attempts to explain how the varying results from these and other trials using [18F]FDG-PET/CT in indeterminate nodules may be explained by varying inclusion criteria and definitions, progressive insights in thyroid cyto- and histopathology, improved cytological differentiation, and technical advances in PET imaging including the transition from PET to PET/CT. In Chapter 10, we respond to the latest version of the French thyroid guidelines, the SFA-AFCE-SFMN 2022 consensus on the management of thyroid nodules, which appeared not long after the main results of the EfFECTS trial were published. In these guidelines, [18F]FDG-PET/CT is not recommended in indeterminate thyroid nodules. Part III explores the efficacy of molecular diagnostics in thyroid nodules. Chapter 11 describes the copy number alteration patterns and loss of heterozygosity (CNA-LOH) that can be distinguished in benign and malignant oncocytic thyroid nodules using a custom 1,500 single-nucleotide polymorphism next-generation sequencing panel that is feasible for clinical practice, and provides considerations for their structured interpretation. In Chapter 12, the diagnostic accuracy of molecular

19 General introduction and outline of this thesis 1 diagnostics is compared to [18F]FDG-PET/CT in indeterminate thyroid nodules for the first time, using the EfFECTS trial cohort. This chapter also discusses the therapeutic efficacy of the combined use of both techniques, presents how to prevent the waste of valuable resources by choosing the right (order of) diagnostic test(s) for the non-oncocytic or oncocytic indeterminate cytology subgroups, and assesses whether molecular alterations drive the differences in [18F]FDG uptake that is observed among benign nodules. In Part IV, the epilogue, this thesis concludes with a general discussion on the studies presented in this work and future prospects.

Elizabeth J. de Koster Lioe-Fee de Geus-Oei Olaf M. Dekkers Ilse van Engen-van Grunsven Jaap Hamming Eleonora P.M. Corssmit Hans Morreau Abbey Schepers Jan Smit Wim J.G. Oyen Dennis Vriens Endocrine Reviews. 2018;39(2):154-191. https://doi.org/10.1210/er.2017-00133. chapter 2 Diagnostic utility of molecular and imaging biomarkers in cytological indeterminate thyroid nodules

22 chapter 2 Abstract Indeterminate thyroid cytology (Bethesda III and IV) corresponds to follicular-patterned benign and malignant lesions, which are particularly difficult to differentiate on cytology alone. As approximately 25% of these nodules harbour malignancy, diagnostic hemithyroidectomy is still custom. However, advanced preoperative diagnostics are rapidly evolving. This review provides an overview of additional molecular and imaging diagnostics for indeterminate thyroid nodules in a pre-operative clinical setting, including considerations regarding costeffectiveness, availability, and feasibility of combining techniques. Addressed diagnostics include gene mutation analysis, microRNA, immunocytochemistry, ultrasonography, elastosonography, CT, [99mTc]Tc-MIBI scintigraphy, [18F]FDG-PET and diffusion-weighted MRI. The best rule-out tests for malignancy were the Afirma® GEC and [18F]FDG-PET. The most accurate rule-in test was sole BRAF mutation analysis. No diagnostic had both near-perfect sensitivity and specificity, and estimated cost-effectiveness. Molecular techniques are rapidly advancing. However, given the currently available techniques a multimodality stepwise approach likely offers the most accurate diagnosis, sequentially applying one sensitive rule-out test and one specific rulein test. Geographical variations in cytology (e.g., Hürthle cell neoplasms) and tumour genetics strongly influence local test performance and clinical utility. Multidisciplinary collaboration and implementation studies can aid the local decision for one or more eligible diagnostics.

23 Diagnostic utility of molecular and imaging biomarkers 2 Introduction Indeterminate thyroid cytology is an eyesore to physicians. It largely corresponds to histopathologically follicular-patterned lesions, both benign and malignant, including follicular adenoma, noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), (encapsulated) follicular variant of papillary thyroid carcinoma (FVPTC or EFVPTC) and follicular thyroid carcinoma (FTC). These neoplasms are particularly difficult to differentiate on fine needle aspiration cytology (FNAC). In the case of FTC, cytology lacks the insight into the tissue structure like histology does: it does not show the capsular and/or vascular invasion that distinguishes an FTC from a benign FA. In FVPTC, the growth pattern is follicular and clearly identifying nuclear features of PTC can usually not be identified cytologically [41-43]. Nevertheless, FNAC currently has a most prominent place in the diagnostic work-up of thyroid nodules. The Bethesda System for the Reporting of Thyroid Cytology was adopted in its current form in 2009, recognizing six diagnostic categories with an incremental risk of malignancy and clinical management guidelines. Although the Bethesda system created a much-used handhold by standardizing the cytological diagnosis and consecutive management of thyroid nodules worldwide, the system does not provide a clear answer for the heterogeneous group of nodules with indeterminate cytology [26, 44]. This includes cytology with atypia of undetermined significance or follicular lesion of undetermined significance (AUS/ FLUS, Bethesda III), and cytology (suspicious for a) follicular neoplasm (SFN/FN) or (suspicious for a) Hürthle cell neoplasm (SHCN/HCN, Bethesda IV). Similar indeterminate cytological categories are found in the British Thyroid Association Thy system and Italian SIAPEC-IAP classification: Thy3a and Thy3f, and TIR3A and TIR3B, respectively (Table 1) [45, 46]. Alongside a doubled incidence of thyroid carcinoma over the past two decades and a prevalence of thyroid nodules stretching far beyond the 5% for palpable nodules – explained by the incidental detection of nonpalpable nodules and clinically occult thyroid cancers on imaging studies – the need for a more accurate diagnostic procedure has grown [47]. This urge was further emphasized when other research groups were unable to reproduce the prevalence of the cytological categories and corresponding malignancy risks proposed by Cibas et al., especially those of the AUS category [44, 48, 49]. Insuperable variations in the worldwide patient populations, and intra- and interobserver variation in the assessment of thyroid cytology were named as likely underlying causes [26, 44, 49, 50]. Yet, it raised questions concerning the overall approach of thyroid nodule diagnosis and whether cytology is the best starting ground. Cost-effectiveness is a major benefit of cytological examination, yet a more accurate test may eventually replace cytological examination completely [51, 52]. At present, however, a supplemental diagnostic procedure is specifically warranted for cytologically indeterminate thyroid nodules. Diagnostic hemithyroidectomies are still customarily performed to obtain a definite histological diagnosis. With a benign histopathological result in approximately three in four cases, surgery was not only unbeneficial but also exposed the patient to unnecessary

24 chapter 2 surgical risks. In the case of malignant lesions, a second-stage completion thyroidectomy is often indicated, which is associated with additional costs and higher risks of surgical complications [5356]. An additional preoperative test or combination of tests for thyroid nodules with indeterminate cytology should prevent unbeneficial diagnostic hemithyroidectomies for benign nodules, limit the number of two-stage surgeries for thyroid malignancies, or both. With rapidly advancing technology, the possibilities for additional diagnostic techniques seem endless: the applications of existing diagnostics such as ultrasound, PET/CT and immunocytochemistry are extended and more clearly demarcated for use in indeterminate thyroid nodules. High-tech molecular tests such as gene mutation panels, gene or microRNA expression profiles and sequencing techniques are hot-topic [44, 57-61]. Every currently known engagement point from the genotype to the phenotype of the tumour is being explored. Combined, the various research fields encompass an extensive range of investigative methods. Individually they usually focus on one or two methods only, making one-to-one comparison of these diagnostics difficult. The 2015 American Thyroid Association (ATA) guidelines suggested several additional tests, but a definitive answer or complete overview of all available tests is still lacking [17]. Alongside higher-level expert discussions and lobbying of med tech companies, clinical endocrinologists and thyroid surgeons ponder about the best solution for their individual patients. Their choices depend on the characteristics of their patient populations, availability and costs of a certain test, and personal preference. In any case, a useful additional test should be accurate, accessible, affordable and affect patient management. This review aims to provide practical considerations for physicians involved in the management of patients with thyroid nodules. It gives an overview of the available literature on additional diagnostic tests for thyroid nodules with indeterminate cytology. We will work our way down from genotype to phenotype, discussing both anatomical and functional techniques, from the state-of-the-art molecular and imaging biomarkers as well as widely available conventional imaging techniques. The ability of a test to distinguish between malignant and benign nodules in a preoperative setting is discussed, focusing on clinical validation and utility, and including the development phase, costeffectiveness and availability of each technique, where appropriate. Table 2 provides a summarized overview of the discussed diagnostics and their main attributes.

25 Diagnostic utility of molecular and imaging biomarkers 2 Table 1. Overview of classification systems for thyroid cytology Bethesda System for the Reporting of Thyroid Cytology [26, 44] British Thyroid Association (BTA) [45] SIAPEC-IAP (Italy) [46] Category Description Category Description Category Description Malignancy rate [26, 44] Proposed management (2015 ATA guidelines [17]) I Nondiagnostic / unsatisfactory Thy1 Thy1c Nondiagnostic Nondiagnostic Cystic lesion TIR1 TIR1c Nondiagnostic Nondiagnostic-cystic 1%-4% Repeat FNAC with US guidance II Benign Thy2 Thy2c Nonneoplastic Nonneoplastic Cystic lesion TIR2 Nonmalignant / benign 0%-3% No clinical follow-up or treatment required III Atypia of undetermined significance / follicular lesion of undetermined significance (AUS/FLUS) Thy3a Atypical features present TIR3a Low-risk indeterminate lesion ~5%-15% Repeat FNAC. If second Bethesda III result, consider additional tests and/or diagnostic hemithyroidectomy IV Follicular neoplasm / suspicious of a follicular neoplasm, including Hürthle cell (oncocytic) type Thy3f Suspicious of follicular neoplasm TIR3b High-risk indeterminate lesion 15%-30% Consider additional tests and/or diagnostic hemithyroidectomy V Suspicious of malignancy Thy4 Suspicious of malignancy TIR4 Suspicious of malignancy 60%-75% Thyroid surgery recommended. Consider preoperative additional (molecular) testing to determine extent of surgery VI Malignant Thy5 Malignant TIR5 Malignant 97%-99% Thyroid surgery recommended

26 chapter 2 Table 2. Overview of test performance and utility of main additional diagnostics in indeterminate thyroid nodules Sensitivity Specificity Molecular Biomarkers Gene Mutation Analysis and Gene Expression BRAF 0%-83% [72, 97, 99, 103] 99%-100% [60, 67-69, 73-76, 79-114] RAS 0%-77% [77, 98] 75%-100% [77, 98, 128] RET/PTC 0%-29% [69, 87] 73%-100% [67, 69] PAX8/PPARγ 0%-29% [93, 97, 147] 96%-100% [75, 97, 108] 7-gene mutation panel 18%-69% [61, 99] 86%-99% [75, 99] NGS 71%-91% [60, 109] 89%-93% [109, 118] Afirma® GEC 83%-100% [168, 169, 172, 182] 10%-52% [164, 168] MicroRNA 57%-100% [61, 197, 199] 58%-100% [187, 199] Immunocytochemistry Galectin-3 0%-92% [82, 212] 68%-100% [82, 213-215] HBME-1 61%-100% [212, 214, 218, 219] 75%-96% [212, 214, 218, 219] CK-19 76%-88% [212, 218, 220] 80%-100% [212, 218, 220] Conventional imaging Ultrasound Dependent on (combination of) feature(s) Dependent on (combination of) feature(s) Elastosonography 47% to 97% [58, 260] 6% to 100% [251, 259, 287] Computed Tomography Unavailable Unavailable Functional and Molecular Imaging [99mTc]Tc-MIBI scintigraphy 56%-79% [58, 296] 52%-96% [58, 108] FDG-PET 77%-100% [37, 303, 305, 306, 309] 33%-64% [37, 308] DW-MRI Unpublished Unpublished BRAF: BRAF point mutation analysis. GEC: Gene Expression Classifier. n.a.: not applicable. NGS: Next Generation Sequencing. RAS: RAS point mutation analysis.

27 Diagnostic utility of molecular and imaging biomarkers 2 Main advantages Main limitations Cost-effectiveness Perfect specificity at low cost Strong geographical variation in occurrence, clinical utility likely limited to gene mutation panels in countries other than South Korea Presumed, though unpublished. €7.50 to $123 per test [91, 101, 110, 126] High prevalence, frequently detected Often found in follicular adenomas (falsepositive); clinical utility limited to gene mutation panels Unpublished Specific for PTC Low prevalence; clinical utility limited to gene mutation panels Unpublished No significant advantages Low prevalence; utility limited to gene mutation panels Unpublished Comparatively inexpensive mutation panel Specificity often insufficient for surgical decision-making USA: likely [158]. Europe: unlikely [53]. $425 to $1,700 per test [158, 159] Highly accurate; rapidly advancing technology Limited availability outside the USA; limited clinical validation studies Unpublished. €230 to $3,200 per test [161, 162] High rule-out capacity [168, 169, 172, 182] Limited availability outside the USA; limited high-quality clinical validation studies Unlikely [53, 159, 175-178]. $3,500 ($1,750 to $7,000) per test [159, 169, 175] Stable expression irrespective of preservation medium [186, 199] Limited clinical validation, research ongoing Unpublished Global availability; inexpensive. Limited current application in cytology; no methodological consensus; limited validation studies for combinations of immunostains. Unpublished. Up to €20 per test. Global availability, low cost Operator dependency; limited prospective clinical validation; diagnostic accuracy of individual US features insufficient for surgical decision-making Presumed, though unpublished. Global availability, low cost, easily performed during standard US work-up Operator dependency; limited clinical utility studies; alternative elasticity cut-off possibly more useful Presumed, though unpublished. Unavailable Not investigated in indeterminate thyroid nodules n.a. More widely available and lower cost than PET Limited test performance; limited clinical validation studies; exposure to limited dose of ionizing radiation Unclear. USA $669-$1,156, Europe: €119-€500 per scan [177, 300, 301] High rule-out capacity; increasing global availability Exposure to limited dose of ionizing radiation. USA: unpublished. Europe: likely [53]. No ionizing radiation Limited evidence; no methodological consensus; research ongoing unpublished

28 chapter 2 Molecular biomarkers Gene mutation analysis and gene expression In the last decades, researchers have unravelled important molecular mechanisms behind the thyroid tumorigenesis, and designated a great number of genetic alterations that are related to the various types of thyroid carcinoma. Several of these mutational markers have found their way to the preoperative diagnosis of indeterminate thyroid nodules. The most common markers are the somatic BRAF and RAS point mutations, and RET/PTC rearrangement, all of which involve the mitogen-activated protein kinase (MAPK) signalling pathway [62-64]. In the 2015 ATA guidelines the potentially strong diagnostic impact of molecular testing is explicitly unfolded, focusing on BRAF testing and the – at that date – two main commercially available tests: the seven-gene mutation panel miRInform® thyroid (Asuragen Inc., Austin, Texas) and the Afirma® gene expression classifier (Veracyte, Inc., South San Francisco, CA). The ATA recommends considerate application of one of these molecular tests for Bethesda III and IV nodules, provided that the result could change the treatment strategy [17]. In the following chapters, the diagnostic potential of mutation analysis in indeterminate thyroid nodules is discussed, including the tests mentioned in the guidelines as well as other individual molecular biomarkers and multi-gene panels addressed in literature. BRAF mutation B-type RAF kinase (BRAF) is a serine–threonine kinase belonging to the rapidly accelerated fibrosarcoma (RAF) family, and the most potent mitogen-activated protein kinase (MAPK) pathway activator. Point mutations in the BRAF proto-oncogene occur in various human cancers. The somatic BRAFV600E mutation is the most common activating mutation in many carcinomas, including thyroid carcinoma [62]. This missense mutation consists of a thymine-to-adenine substitution at nucleotide 1799 (c.1799T>A), resulting in an amino acid substitution where valine is replaced with glutamate at codon 600 (hence V600E)[65, 66]. BRAF has an important function in cell proliferation, differentiation, and apoptosis. Upregulation of BRAF through the BRAFV600E activating mutation is associated with tumorigenesis [66]. In differentiated thyroid cancer, the BRAFV600E mutation is exclusive to PTC, occurring in 50% to 80% of these tumours [62, 63, 67-77]. The BRAFV600E mutation has been prognostically associated with poor clinicopathological outcomes, such as increased incidence of extrathyroidal invasion, recurrence of disease, and distant metastasis of the tumour [78-80].

29 Diagnostic utility of molecular and imaging biomarkers 2 BRAF mutation analysis has been extensively studied as a rule-in test for thyroid carcinoma. The BRAF mutation is superior to other mutations in its oftentimes 100% specificity – a positive mutation could prevent two-stage surgery for an indeterminate thyroid nodule [60, 67-69, 73-76, 79-114]. Even though the BRAF mutation was found in a majority of PTC in a number of studies, the prevalence of the BRAF mutation in indeterminate cytology ranged from 0% to 48% in individual studies [82, 84, 86, 97, 103, 108]. Reported sensitivities were therefore heterogeneous and generally poor, ranging from 0% to 83% [67, 72, 77, 84]. Other types of thyroid carcinoma occurring in indeterminate nodules, including FTC, FVPTC and Hürthle cell carcinoma (oncocytic variant of follicular thyroid carcinoma, FTC-OV), were respectively never or infrequently BRAF mutation-positive [69, 75, 76, 80, 88, 95, 114]. Predominated by follicular type carcinoma, the BRAF mutation rarely occurs in Bethesda IV cytology [67, 69, 75, 79, 88, 90, 92, 95, 97, 98, 101, 103, 104, 108, 111-118]. Likely contributors to the observed heterogeneity are known global variations in the occurrence rates of PTC and BRAF mutations. In South Korea, where iodine consumption is high, 90% to 95% of thyroid cancers are PTC. More specifically, the proportion of BRAF -mutated PTC is very high: rates of 80% to more than 90% are reported [72, 84, 115]. Consequently, BRAFV600E mutation analysis might have both high specificity and high sensitivity in these populations. Studies with higher sensitivities were more often of South Korean origin and frequently demonstrated sensitivity above 40%, with the prevalence of BRAF mutations reported as high as 30% to 48%. [72, 77, 84, 85, 111, 119-121]. Conversely, the majority of studies with sensitivity below 10% were conducted in Western countries (USA, Europe or Canada), with some studies reporting no BRAF mutations at all [60, 69, 75, 83, 86, 90, 94-97, 99, 100, 103, 107, 108, 114, 118]. Some South Korean studies based surgical decision-making on the result of the BRAF mutation analysis: surgery was relatively less often performed in BRAF mutation-negative indeterminate nodules [72, 77, 116, 121]. Such a surgical management strategy is not oncologically safe for Western countries (e.g. Europa or Northern America), where 80% to 90% of thyroid carcinomas are PTC and reported rates of BRAF-mutated PTC vary from 30% to 40% [72, 84, 115]. Moreover, even though the true sensitivity of BRAF mutation analysis is presumably high in South Korea for the mentioned epidemiological reasons, the conservative management of BRAF mutation-negative nodules likely magnified test sensitivity by underestimating the rate of BRAF-negative malignant nodules in these studies. Altogether we estimate that approximately one in five South Korean patients would benefit from BRAF mutation analysis, opposite mere one in 25 patients from other countries. BRAF mutation in papillary microcarcinoma Papillary microcarcinoma (mPTC) have lower BRAF mutation rates [91, 96, 101, 106, 111, 114, 122]. The ATA guidelines are reserved with regard to the recommended clinical management of positive BRAF mutation in mPTC, as its relation to extrathyroidal spread and positive lymph node metastases is not as clear as in larger thyroid carcinoma. Although there are studies that associate mPTC to factors of poorer prognosis, the 2015 guidelines recommend that BRAF-mutated mPTC are treated as low-risk malignancies [17, 73].

30 chapter 2 BRAFK601E point mutation A less common activating BRAF mutation is BRAFK601E (c.1801A>G), which occurs considerably less frequently than the BRAFV600E variant and is associated with FVPTC with high specificity [123]. Clinically, the characterization of a small cohort of thyroid malignancies with a BRAFK601E mutation showed better outcomes than for BRAFV600E mutated tumours: no extrathyroidal tumour extension, recurrence, lymph node or distant metastasis were reported in indeterminate BRAFK601E positive tumours with a median follow-up of 20 months (range 4-47) [124]. Availability, cost-effectiveness and limitations of BRAF mutation analysis Altogether, the consistent perfect specificity in a large number of studies supports the use of BRAF mutation analysis in obviating two-stage surgery. The technique is increasingly available in the clinical setting worldwide. A prior meta-analysis of eight studies questioned the cost-effectiveness of BRAFV600E mutation analysis in indeterminate thyroid nodules based on a mere 4.6% mean prevalence of the mutation [125]. Cost-effectiveness studies concerning sole BRAF mutation analysis in indeterminate thyroid nodules are lacking. Regardless, cost-effectiveness is generally presumed, as average costs for testing are relatively low and decreasing over time. Depending on the applied molecular technique, reported costs for BRAF mutation analysis ranged between €7.50 and $123 per tested sample [91, 101, 110, 126]. Low sensitivity remains the main limitation of BRAF mutation analysis, irrespective of the type of indeterminate cytology. Proficiency of the test in preoperative patient management depends on the regional occurrence rate of BRAF-mutated PTC; in South Korea, more patients will benefit from BRAF mutation analysis, and the probability and extent of cost-effectiveness are likely to increase [104]. In other health care systems, such as in the UK, cost-effectiveness is likely more constrained. Nonetheless, BRAF testing could still save approximately half the surgical costs in BRAF mutation-positive carcinoma [101, 103]. These global variations should be considered before local implementation of sole BRAF mutation analysis. RAS point mutation Point mutations in the gene family of retrovirus-associated DNA sequences (RAS) together constitute the second most frequently occurring genetic alteration in thyroid carcinoma. In indeterminate thyroid nodules, they are the most common genetic alteration, due to a strong association of RAS mutations with the follicular-patterned lesions that make up these cytological categories: follicular adenoma, FTC, FVPTC and noninvasive follicular thyroid neoplasms with papillary-like nuclear features (NIFTP) [41, 43, 69, 97, 127, 128]. Originally, two of the three homologous RAS genes were identified as viral genes of the oncogenic Harvey (HRAS) and Kirsten (KRAS) murine sarcoma virus; the third, NRAS,

31 Diagnostic utility of molecular and imaging biomarkers 2 was first identified in neuroblastoma cells [129, 130]. The genes code for GTP-binding RAS proteins, which are involved in intracellular signalling in the MAPK/ERK pathway. Mutation causes overactive RAS signalling and could ultimately induce malignant transition [64]. RAS mutation in thyroid carcinoma has been associated with favourable prognostic factors, such as encapsulation of the tumour and absence of lymph node metastases, but also with factors indicative of an adverse prognosis, such as poor cell differentiation [42]. RAS mutations are not specific for carcinoma and found in both malignant and benign lesions [69, 99, 128]. According to the 2015 ATA guidelines, Bethesda III or IV nodules with a RAS mutation should be treated similar to the Bethesda V category, as approximately 4 out of 5 are malignant [17, 44]. HRAS, KRAS and NRAS mutations are mutually exclusive. They are each associated with slightly different types of cytology and histology, and consequently a different clinical course. In general, point mutations in NRAS codon 61 and HRAS codon 61 are said to occur most frequently [43, 102]. KRAS is associated with oncocytic lesions and a lower malignancy rate than other RAS mutations [131]. A RAS point mutation is found in 0% to 38% of the indeterminate nodules [77, 98]. Moreover, approximately a third of all reported malignancies resulting from indeterminate thyroid cytology are RAS mutation positive, frequently FVPTC or FTC [69, 75-77, 114]. Sporadic cases of RAS mutationpositive FTC-OV and MTC are reported [75, 76]. In individual studies, sensitivity and specificity of RAS mutation analysis ranged from 0% to 77% and from 75% to 100%, respectively [77, 98, 128]. Test performance was similar for Bethesda III and IV categories, although the mutation occurred more frequently in Bethesda IV nodules [60, 67, 69, 77, 88, 97, 98, 114, 118, 128]. Histopathologically benign nodules carrying a RAS mutation are histopathological follicular adenoma in most cases, but also oncocytic variant of follicular adenoma (Hürthle cell adenoma) or hyperplastic nodules [67, 69, 76, 88, 128]. There is an ongoing discussion regarding the interpretation of a false positive RAS mutation. It is presumed that an oncogenic RAS mutation predisposes a follicular adenoma for progression into follicular carcinoma – a RAS-mutated follicular adenoma should be considered a premalignant pre-invasive follicular neoplasm. These assumptions put false-positives in a different light, as it would justify resection of such lesions through hemithyroidectomy. Consequently, the lesions could also be considered true-positives – improving the specificity of RAS mutation analysis [41, 60, 77, 97, 99, 109]. However, the exact mechanisms behind the malignant potential and transition for RAS-mutated follicular adenomas are not yet clarified and difficult to appreciate in a clinical setting. Similar to BRAF, there was evident global variation in the distribution of RAS mutations. Many European and American studies reported a clear predominance of RAS mutations over BRAF mutations. Solely a Brazilian study of 116 Bethesda III and 20 Bethesda IV thyroid nodules reported only BRAF mutations and not a single RAS mutation [98]. The previously described predominance of BRAF mutations in South Korean populations was confirmed in the sole study that investigated both point mutations in one population [77]. Combined BRAF/RAS mutation analysis could be considered, although geographical differences in the distribution of the two genetic alterations strongly influence feasibility. A gene mutation panel consisting of more genetic alterations (discussed in a

32 chapter 2 next chapter) is most likely more useful. Sole RAS mutation analysis is not accurate in the preoperative setting. Although specificity is high, only two out of three RAS mutation positive indeterminate nodules are histopathologically malignant, evidently fewer than assumed and previously described in the ATA guidelines. Therefore, RAS mutation positive indeterminate thyroid nodules should be surgically managed with no more than hemithyroidectomy. Whether hemithyroidectomy is justified for RAS-mutated follicular adenomas as a precancerous lesion, is yet under debate. RET/PTC rearrangement Rearrangements of the RET proto-oncogene arise from the fusion of the 3’ end of RET to the 5’ regions of unrelated genes that are expressed in thyroid follicular cells. Proto-oncogene RET encodes for a transmembrane receptor with a tyrosine kinase domain; a RET/PTC rearrangement causes inappropriate overexpression of that domain. It activates the MAPK and PI3K/AKT pathways and stimulates malignant transition of the cell through BRAF [132, 133]. At least 12 different fusion variants have been detected until today, of which RET/PTC1 and RET/PTC3 are the most common. They have a well-known association with PTC. Cases of both rearrangements in a single lesion are also reported [42, 132, 134, 135]. RET/PTC rearrangements, especially RET/PTC3, occur more frequently in PTC in children or patients that were exposed to ionizing radiation and are clinically associated with the presence of lymph node metastases [42]. Worldwide variations in frequency of RET/PTC rearrangements exist, dependent on demographics and ethnicity. The RET/PTC rearrangement is present in 42% of PTC in Western populations with a predominance of RET/PTC1, and in 37% of PTC in Asian populations with a predominance of RET/PTC3. Without radiation exposure, in female PTC patients RET/PTC1 is predominant [136]. The rearrangements are also found in benign nodules, especially in patients that were exposed to ionizing irradiation [67, 135]. Alike RAS mutations, it is assumed to be an activating genetic alteration and it is argued that a histopathologically benign nodule with a RET/PTC rearrangement should be considered a precancerous lesion. RET/PTC rearrangements are seldom found in indeterminate nodules. In many studies, no RET/PTC translocation was found at all. Most studies investigated RET/PTC in light of a gene mutation panel and paid it no specific attention [60, 67, 69, 75, 76, 83, 87, 88, 93, 97, 99, 100, 102, 108, 114, 118, 134]. Only Guerra et al. solely investigated the RET/PTC rearrangement in 101 thyroid nodules of all cytological categories. In this Italian study, RET/PTC rearrangements were found in 18 of the 50 PTC (36%) using RT-PCR and Southern-Blot. All these RET/PTC-positive carcinomas were Thy4 or Thy5 nodules on cytology. Among the 24 Thy3 nodules, two nodules with a RET/PTC3 rearrangement were histopathologically benign [134]. Noteworthy, Sapio et al. detected two RET mutations during their RET/PTC assessments. In contrast to the RET/PTC translocation, RET point mutations are related to sporadic and familial MTC [83, 137]. Surgery confirmed histopathological MTC in the RET-mutated nodules [83].

RkJQdWJsaXNoZXIy MTk4NDMw