Milea Timbergen

148 By default in WORC, the performance is evaluated in a 100x random-split train-test cross- validation. In the training phase, a total of 100,000 pseudo-randomly generated workflows is created. These workflows are evaluated in a 5x random-split cross-validation on the training dataset, using 85% of the data for actual training and 15% for validation of the performance. All described methods were fit on the training datasets, and only tested on the validation datasets. The workflows are ranked from best to worst based on their mean performance on the validation sets using the F1-score, which is the harmonic average of precision and recall. Due to the large number of workflows executed, there is a chance that the best performing workflow is overfitting, i.e. looking at too much detail or even noise in the training dataset. Hence, to create a more robust model and boost performance, WORC combines the 50 best performing workflows into a single decision model, which is known as ensembling. These 50 best performing workflows are re-trained using the entire training dataset, and only tested on the test datasets. The ensemble is created through averaging of the probabilities, i.e., the chance of a patient being DTF or non-DTF, of these 50 workflows. A full experiment consists of executing 50 million workflows (100,000 pseudo-randomly generated workflows times a 5x train-validation cross-validation times 100x train-test cross-validation), which can be parallelized. The computation time of training or testing a single workflow is on average less than a second, depending on the size of the dataset both in terms of samples (i.e., patients) and features. The largest experiment in this study, i.e. the differential diagnoses including 203 patients with both a T1w and T2w MRI had a computation time of approximately 32 hours on a 32 CPU core machine. The contribution of the feature extraction to the computation time is negligible. The code for the model creation, including more details, has been published open-source as well 5 . 5