Margriet Kwint

Real world evidence to audit NTCP-models for acute esophagus toxicity 137 7 Median V50 was 22.9% (IQR 10.3% - 35.7%) and the median V60 was 5.1% (IQR 0.0% - 20.96%). Discrimination of both algorithms demonstrated a similar moderate accuracy with an area under the curve (AUC) of 0.706 (95%CI 0.637 to 0.775) for the V50 model and an AUC of 0.685 (95% CI 0.614 to 0.757) for the V60 model, respectively ( Figure 1 ). Calibration showed that the V50-model slightly overestimated the risk of developing grade ≥2 AET in low-risk (predicted incidence <50%) patients while in high risk patients (predicted incidence >50%) the predicted incidence was in accordance with the observed incidence of grade ≥2 AET. The V60-model overestimated the risk of developing grade ≥2 AET in low-risk patients and underestimated the risk of developing grade ≥2 AET in high-risk patients ( Figure 2&3 ). In both models, the sensitivity was higher for lower cut-off points and the specificity was higher for higher cut-off points. For the V50-model, a cut-off point of more than 40% probability of developing grade AET resulted in the most favorable sensitivity of 95.8% for grade ≥2 with specificity scores of 30.1%. For the V60-model, this cut-off point resulted in a sensitivity of 68.3% for grade ≥2 with specificity scores of 58.8%. Validation V50- and V60-model before and after dose de-escalation The patient cohort was split into a population before and after dose de-escalation. The median V60 decreased significantly (p=0.001) after the dose de-escalation on the mediastinal lymph nodes from 12.7% (IQR 25.3%) to 1.3% (IQR 17.1%). The median V50 decreased as well (from 26.9% (IQR 23.5%) to 21.7% (IQR 24.6%)) but this was not significant (p=0.120). The incidence of grade ≥2 and grade ≥3 AET decreased after de-escalation of the mediastinal lymph nodes from 50.5% to 37.9% (p=0.032) and 7.9% to 3.4% (p=0.076) respectively. We compared the accuracy of the V50- and V60-model for grade ≥2 AET between the 2 time periods ( Figure 4 ). For the V50-model, an almost similar model fit was found with an AUC of 0.690 (95%CI 0.585-0.795) before dose de-escalation and 0.707 (95%CI 0.609 – 0.804) after. For the V60-model, the model fit decreased after dose de-escalation; AUC= 0.722 (95%CI 0.621 – 0.823) compared to 0.624 (95%CI 0.518 – 0.729), respectively ( Figure 2 ). The Delong-test (24, 25) showed no significant differences between AUC of both models (p= 0.41 (V50-model) and p=0.09 (V60- model)).