Marilen Benner

CHAPTER 5 118 Flow cytometry For surface staining, a minimum of 250.000 PB or MB was stained using fluorochrome- conjugated monoclonal antibodies (moAbs) for 20min at room temperature in the dark. An overview of all moAbs used is shown in Supplementary Table 2. Staining of intracellular moAbs was performed with a minimum of 500.000 cells. Samples were fixed and permeabilized according to manufacturer’s instructions (eBioscience, San Diego, USA). Flow cytometry data were analyzed using Kaluza (Beckman Coulter, v2.1). Gate settings (Supplementary Figure S1) were based on a fluorescence minus one strategy. Isotype controls were applied. Data analysis Data were processed using R v.4.0.2 and the ggpubr, ggplot2, ggsignif, tidyr packages. Non- parametric Mann-Whitney test was used. Values of p <0.05 were considered statistically significant. An ensemble feature selection was used to detect of features allowing for cohort classification. This strategy was previously designed and validated to overcome the bias of using a single machine-learning algorithm, thus allowing for a more robust selection of classifying features (29, 51). Eight classifiers (Bagging, Gradient Boosting, Logistic Regression, Passive- Aggressive regression, Random Forest, Ridge regression, SGD (Stochastic Gradient Descent on linear models), SVC (Support Vector Machines Classifier with a linear kernel) classifier) were run in 10-fold and used to score features on their importance for classification. Scoring of the individual algorithms was combined in an ensemble ranking: for Bagging, Gradient Boosting and Random Forest analysis that work with classification trees, features of the trees’ splits were counted and ranked by frequency; for PassiveAggressive, Logistic, and Ridge regression, SGD, and SVC classifier feature importance was assigned by the coefficients’ value associated with each feature. The ranking of each classifier was scored based on times it appeared within the top classifying features. A detailed description of the ranking used for the ensemble strategy has previously been presented (29). To reduce the number of features to the ones that allow for optimal classification, classifiers were run repeatedly with the top 80% features in a recursive feature selection approach. All classifiers were subjected to stratified 5-and 10-fold cross- validation. Having determined which features allow for the most robust classification, the set of parameters was used to run the individual classifying algorithms, combined with 10-fold cross- validation.