Joeky Senders

13 General introduction and thesis outline from complex data sources by learning patterns automatically, even those that are undetectable or meaningless for humans. Within the field of machine learning, a broad distinction can be made between supervised learning, unsupervised learning, and reinforcement learning. 17 Supervised learning algorithms learn from examples for which the desired output is known (i.e., labelled data) to develop a model that can compute predictions in new cases. Unsupervised learning techniques, on the other hand, seek to find similarities and patterns in unlabeled data. These algorithms can be valuable for identifying previously unknown clusters within the data. Reinforcement learning algorithms aim to determine the ideal behavior within a context or environment. It differs from supervised learning as it seeks to maximize the cumulative reward for series of actions instead of a single prediction. 17 The current thesis focusses on supervised algorithms and explores their utility in the clinical and scientific realms of neurosurgical oncology. The ability to learn from known examples indicates the key difference between supervisedmachine learning and traditional programming. In traditional programming, a programmer manually writes a set of instructions – the program – to generate the desired output from a given set of inputs. In supervised machine learning, the input is provided together with the desired output, and computer algorithms are asked to derive the rules from the labeled training data. The product of this process is therefore not the desired output itself but amodel that can predict the output in new observations (Figure 1). The automated learning process is an efficient way of analyzing large quantities of data, modelling hidden relationships in complex data sets, and adapting to changing environments. In the learning process, algorithms try to find the optimal combination of input variables (i.e., features) and weights given to these features in the model, thereby minimizing the difference between the predicted and observed outcomes. The mathematical structures of the most frequently used machine learning algorithms are briefly outlined in Figure 2. Machine learning algorithms are founded upon statistical principles and should be considered as an extension of traditional statistical algorithms. They exist along a continuum determined by howmuch is specified by humans and howmuch is learnt by the machine, referred to as the machine learning spectrum. 18 For example, regression analysis on the low end of the machine learning spectrum requires more human guidance but provides valuable insights into the underlying predictive mechanisms, whereas deep learning on the high end of the spectrum can develop models from the raw data itself at the cost of model interpretability. The current thesis describes several studies along the continuum of the machine learning spectrum to derive knowledge from clinically-derived patient data and inform clinical decision-making in neurosurgical oncology (Figure 3).