Timo Soeterik

115 External validation of the Martini Nomogram (e.g. Gleason grade group 3) OR: 26.7 95% CI: 13.5 - 53.1 and EPE on MRI present OR: 7.27 95% CI 4.7 – 11.2), indicating a lack of precision of the estimated coefficients. This lack of precision and the fact that the validation cohort comprises of more patients with higher-risk disease could explain the observed miscalibration. The lack of precision of this model may be also explained from another methodological viewpoint. What characterizes the study cohort is that it comprises clustered data. Each prostate lobe is considered as an independent case. However, if prostate cancer is found bilaterally, both lobes derived from a single patient are included in the sample, leading to clustering within the study population. Prediction models based on clustered data require a slightly different development approach and may require use of a random intercept. If this methodological approach was used for the development of the model, it may have been more precise. 20 The poor model fit can also be explained by the definition of the selected predictors and the used outcome. First, our patient cohort comprises of patients diagnosed and staged at our own centre and referred patients staged elsewhere. There was no central review of MRI and pathological evaluation (the endpoint EPE), leading to a wide range of different radiologists and pathologists evaluating MRI/prostate specimen, which could induce interobserver variability. Lastly, the outcome of EPE on MRI was binary. In our population, radiologists used a subclass in their reports defined as “indefinite” or “uncertain” EPE in a substantial percentage of patients (10%). By dichotomizing this predictor, a lot of explained variance is lost. To overcome this problem, the use of a Likert scale for the probability of EPE presence may improve the model’s accuracy. 21,22 The strengths of this study include a large sample size with a high EPV rate, resulting in a study population suitable for external validation. The limitations of this study are its retrospective nature and the lack of central review regarding MRI and histological features. However, we assume that the present case mix variation reflects a real-world clinical situation. CONCLUSIONS External validation of the novel nomogram developed by Martini and co-workers in a large real-world cohort showed fair discriminative ability of this model, but poor calibration. After updating, substantial miscalibration was still present. Use of this nomogram for individualized risk predictions is therefore not recommended. 6