Joeky Senders

92 Chapter 5 well as the generated output of the fitted models (class probability, numeric estimate, or subject-level survival curve, respectively). Computational efficiency was measured in terms of model size, loading time, and computation time to produce a prediction. For models that do not provide natural prediction confidence intervals, model predictions were bootstrapped 100 times with replacement to provide such estimates. We also developed an online, interactive, and graphical tool based on the overall best performing model. Statistical analyses were conducted using R (version 3.5.1, R Core Team, Vienna, Austria). 11 All machine learning modeling was performed using the Caret package, 12 and the application was built and deployed using the Shiny package and server. 13 Results Patient demographics and clinical characteristics In total, 20,821 patients met our inclusion criteria. Missing data was multiply imputed for insurance status (16.7% missingness), tumor size (14.3%), tumor laterality (12.0%), tumor location (6.6%), marital status (3.8%), tumor extension (1.6%), surgery type (1.3%), and race (0.2%). Survival time was censored for 3,745 patients (18.0%). The estimated median survival time in the total cohort was 13 months (95%-CI 12-13 months). The total cohort was split into a training and hold-out test set of 16,656 and 4,165 patients, respectively (Table 1).