Connie Rees

110 As for calibration of the model, the Hosmer-Lemeshow goodness-of-fit test did not reach significance (chi-square 4.398, p = 0.820). This suggests that the model is a good fit for the data, and that the observed frequencies are not significantly different from the expected frequencies based on the model (Hosmer et al., 2013). Since no clinically implemented tool for the diagnosis of adenomyosis on MRI is yet available, this study represents one of the first external validation studies. The fact that two investigators (one of them directly involved in the internally validation study) independently assessed all MRIs blinded to the outcome of the pathology reports, is a strength of this study. Besides, this external validation can be seen as a geographical validation, because the participating hospitals for both studies were in different regions of the country, which is considered a reliable approach for external validation (207). Also, the model is plausible to use in daily practice due to the straightforward clinical and MRI parameters. A limitation of the study is the fact that it only consists of 195 included patients. 78 patients (40%) received the histopathological diagnosis of adenomyosis. Studies suggest that at least a number of 100 events (so in this case 100 patients with the diagnosis of adenomyosis) and 100 non-events are needed for reliable evaluation of a model’s external performance (190,213). To detect smaller differences in performance of the model, larger sample sizes (at least 100 events) are needed. In small(er) datasets the model could be overoptimistic (213). As in the initial cohort, in this cohort the phase of the menstrual cycle at the time of MRI was never reported. The thickness of the junctional zone changes during the menstrual cycle due to hormone levels (46,214). This is a limitation of the entire study, because both studies were unable to correct for the influence of the phase of the menstrual cycle due to it not being reported. Another point of discussion could be that this model is tested only in patients who underwent hysterectomy. This may introduce selection bias, as it limits the generalisability of the model to patients undergoing a hysterectomy, and not

RkJQdWJsaXNoZXIy MTk4NDMw