I 273 Table of Contents 275 306

Karlijn Hummelink

Appendix 274 The final normalization coefficient was calculated for each spectrum and all feature values were divided by the normalization coefficient for the spectrum/sample to produce the final processed feature values. These were the feature values that were used for the development set samples for the creation of the test and these would be the feature values that would be input into the test classification algorithm when performing the fully specified test on a new sample. 3. Machine Learning: Development of the Classification Algorithm A test classification is generated using a combination of 3 binary classifiers. Each classifier was created using the classifier development approach outlined in the next section. a. A hierarchical classifier development platform designed for problems where the number of available instances is smaller than the number of measured attributes Each classifier was created using a hierarchical classifier development platform designed specifically to work well in settings where the number of attributes (features) measured for each instance (sample) exceeds the number of instances available for classifier training. It incorporates aspects of traditional and modern machine learning, including bagging, boosting, and regularization using dropout, with the aim of producing classifiers with reliable performance estimates from relatively small sample sets while minimizing chances of overfitting to peculiarities in the development set data. This platform has been used in several other personalized medicine projects. Details of the approach can be found in Roder J, Roder H. Classification generation method using combination of mini-classifiers with regularization and uses thereof. United States patent US 2016; 9,477,906. (ed. Office, U.S.P.a.T.) (Biodesix, USA, 2016), and references [14] and [17]. b. Training Class Definition and a Semi-Supervised Approach to Simultaneous Refinement of Training Class Labels and Classifier This approach used supervised learning, i.e., training class labels were needed for each instance (sample). To solve this particular classification problem using supervised learning one would need to know which patients received durable benefit from immune therapy and which did not. It was not a priori clear how to unambiguously define durable benefit from time-to-event data in a way that revealed underlying information in the molecular data. We employed an approach

Made with FlippingBook

RkJQdWJsaXNoZXIy MTk4NDMw