Milea Timbergen

146 The dataset used in this study is highly heterogeneous in terms of acquisition protocols. Especially the variations in slice thickness may cause feature values to be highly dependent on the acquisition protocol as this varied between 1.0 mm and 11 mm,. Hence, extracting robust 3D features may be hampered by these variations, especially for low resolutions. To overcome this issue, all features were extracted per 2D axial slice and aggregated over all slices. Afterwards, several first-order statistics over the feature distributions were evaluated and used in the machine learning approach. The images were not resampled, as this would result in interpolation errors. Due to variations in especially the magnetic field strength, echo time, and repetition time, the image contrast highly varies, which would affect the feature values. To partially overcome this, each 3D MRI was normalized using z-scoring before feature extraction. These settings are also the default in WORC. Supplemental Material 2. Adaptive workflow optimization for automatic decision model creation This Supplemental Material is similar to 1 , but details relevant for the current study are highlighted. The Workflow for Optimal Radiomics Classification (WORC) toolbox 2 makes use of adaptive algorithm optimization to create the optimal performing workflow from a variety of methods. WORC defines a workflow as a sequential combination of algorithms and their respective parameters. To create a workflow, WORC includes algorithms to perform feature scaling, feature imputation, feature selection, oversampling, and machine learning. If used, as some of these steps are optional as described below, these methods are performed in the same order as described in this Supplemental materials. More details can be found in the WORC documentation 7 . Feature scaling was performed to make all features have the same scale, as otherwise the machine learning methods may focus only on those features with large values. This was done through z-scoring, i.e., subtracting the mean value followed by division by the standard deviation, for each individual feature. In this way, all features had a mean of zero and a variance of one. In the analysis including the T2w or T1w post contrast sequences, in case of a missing sequence, feature imputation was used to estimate replacement values for the missing sequence. Strategies for imputation included 1) the mean; 2) the median; 3) the most frequent value; and 4) a nearest neighbour approach. 5