Timo Soeterik

108 CHAPTER 6 Model performance, recalibration and clinical usefulness Model calibration, which refers to the agreement between observed endpoints and predictions, was assessed using calibration-in-the-large and the Hosmer-Lemeshow goodness-of-fit test. Calibration was further evaluated in a graphical matter using calibration plots, wherein the agreements between predicted probabilities and observed outcomes in the dataset were visualised. Discrimination, which refers to the ability of the model to distinguish a case with the endpoint (EPE) from a case without EPE, was quantified using the AUC. 14 In case of poor model fit, the potential need for adjustment of the intercept and/or slope (and if needed, the degree of adjustment necessary) was determined by inserting the linear predictor as the only predictor in the logistic regression formula. 15 To determine clinical usefulness, sensitivity and specificity were determined for different risk thresholds (0.07, 0.10, 0.15, 0.20, 0.25, 0.30, 0.40 and 0.50). We also calculated the net benefit for a range of threshold probabilities, using decision curve analysis. The net benefit was calculated as the proportion of “net” true positives (true positives corrected for the false positives weighted by the odds of the risk cut-off, divided by the sample size). 16,17 Statistical analysis was performed using RStudio Version 1.1.456. RESULTS Patient population As pointed out in Figure 1, 625 patients underwent RARP, and a total of 792 prostate lobes were derived for analysis. EPE was reported on pathological evaluation in 250/792 (32%) lobes, resulting in an adequate number of events for validation (events per variable [EPV] = 41). 18 Baseline characteristics on a patient level are presented in Table 1. Descriptive statistics of the predictors and outcome variable of all prostate lobes included for analysis are reported in Table 2.