Milea Timbergen

127 Radiomics feature extraction The tumours were all manually segmented once on the T1w-MRI by one of two clinicians under supervision of a musculoskeletal radiologist (4 years of experience). A subset of 30 DTF tumours was segmented by both clinicians, in which intra-observer variability was evaluated through the pairwise Dice Similarity Coefficient (DSC), with DSC > 0.70 indicating good agreement 20 . To transfer the segmentations to the other sequences, all sequences were automatically aligned to the T1w-MRI using image registration with the Elastix software 21 . For each lesion, per MRI sequence, 411 features quantifying intensity, shape and texture were extracted. Details can be found in Supplemental Materials 1 and Supplemental Table 1. Decision model creation To create a decision model from the features, the WORC toolbox was used, see Figure 1 22-24 . In WORC, the decision model creation consists of several steps, e.g., feature selection, resampling, and machine learning. WORC performs an automated search amongst a variety of algorithms for each step and determines which combination of algorithms maximizes the prediction performance on the training set. More details can be found in Supplemental Materials 2. Evaluation Evaluation of all models was done through a 100x random-split cross-validation. In each iteration, the data was randomly split in 80% for training and 20% for testing in a stratified manner, to make sure the distribution of the classes in all sets was similar to the original (Supplemental Figure 1). Within the training set, model optimization was performed using an internal cross-validation (5x). Hence, all optimization was done on the training set to eliminate any risk of overfitting on the test set. For the differential diagnosis cohort, a binary classification model was created using a variety of machine learning models. For the DTF cohort (predicting the CTNNB1 mutation), a multiclass classification model was created using random forests. 5