Marilen Benner

CHAPTER 5 112 RESULTS As immune responses are never limited to a single cell type, minor changes in frequency of a specific subset can affect neighboring cells through cell contact or secretion of soluble factors. While a subtle change in numbers or characteristics of a given immune cell population might fall within a physiological range, hampering its detection, machine learning can identify a change in overall patterns and the underlying cell types involved. To collect a dataset suited for multivariate analysis, we established a phenotypic flow cytometry-based overview of immune cell frequencies (Figure 1A) of PB and MB (n=15, n=18, respectively) of women who suffered from at least two consecutive unexplained miscarriages (patient characteristics, Table 1). The analysis covered total leucocyte populations, T, B, and NK cell subsets using five established staining panels (27)(Supplementary Table 1). In total, 63 immune subsets, age and CMV status were assessed, which results in 65 features that are taken into account for further analysis (Supplementary Figure S1, Supplementary Table 2). Data were compared to a control cohort, of women with uncomplicated pregnancies (PB, n=13; MB n=14). We used machine-learning based cohort classification to identify immune cell subsets that discriminate RPL from control, based on either MB or PB profiles. To achieve this, we employed an ensemble strategy as it allows for robust feature selection in a low sample size setting (28). Through combining 8 distinct classification algorithms, the ensemble overcomes any possible bias of its individual classification algorithms (Lopez-Rincon et al. 2019; Lopez-Rincon et al. 2020). The outcome of the individual algorithms were weighted and combined into a single ensemble ranking (29). The 80% top features of this list were then used to run the individual algorithms, including 10-fold cross-validation to ensure generality of the results, and the average classification accuracy is calculated. By repeatedly reducing the list of top-contributing features by 20%, the optimal number of features to achieve robust classification was determined (Figure 1B). With this approach, we identified that a combination of 4 cell types for PB (non-switched memory B cells, CD8 + CD4 - T cells, NKbright cells, CD4 + effector T cells; Figure 1C left panel, Supplementary Figure 3), and 6 cell types for MB (Ki67 + CD8 + T cells, HLA-DR + Treg, CD27 + B cells, NKbright cells, Treg cells, CD24 Hi CD38 Hi B cells; Figure 1C right panel, Supplementary Figure 3), together with age, allowed for optimal classification as RPL versus control. After determination of the features used for classification, the individual classifying algorithms were run using the respective immunological parameters. This resulted in a maximum average area under the curve (AUC) of 0.82 ± 0.23 for PB, and 0.90 ± 0.17 for MB when analyzed by PassiveAgressive Classifier (Figure 1D), with an accuracy of 0.87 ± 0.16 and 0.84 ± 0.14, respectively (Supplementary Table 3). Reducing the number of features included in a multivariate approach allows for more robust outcomes as features of high variance, withminor distinctive value, can be excluded. However, this approach might mask notable differences of an individual feature, as only the most contributing