Karlijn Hummelink

Chapter 4 136 Statistical analysis The Mann-Whitney, Fisher exact and linear-by-linear association tests, respectively, were used to assess differences in patient characteristics between training and validation cohorts. Differences were considered statistically significant if *P<0.05. Correlations between the PD-1T signature and PD-1T TILs assessed by IHC or the PD-1T signature and the Tumor Inflammation Signature (TIS), respectively, were evaluated using linear regression analysis. A 2-level batch effect correction on the mRNA expression data was performed on all NKI-AVL and CERTIM patients using an empirical bayes linear regression. This was performed to correct for batch effects between the NKI-AVL and the CERTIM cohort and between the different NKI-AVL cohorts. Both batch effect correction and gene expression analysis were performed with R 4.1.0 and the package limma 3.48.0. Differential gene expression analysis was performed using linear regression on the gene log-expression. Separate models were fitted for each gene, and the computation of moderated t-statistics and log-odds of differential expression was performed via empirical bayes moderation. Analysis of main biological processes involved in the gene signature was performed by gene ontology analysis using the Fisher exact test with the R package topGO 2.44.0 (SCR_014798). P-values were adjusted via BenjaminiHochberg. A prediction model was built using logistic regression combined with regularized regression for variable selection using LASSO (least absolute shrinkage and selection operator). By adding a penalization term on the coefficients of the model, the coefficients of the models are constrained to zero leading to variable selection. Due to the limited sample size and unequal distribution over the DC 12m and PD patient groups, cross validation was limited to three-fold. Thus, a three-fold cross-validation for the selection of the optimal penalization term of the regularized regression, based on the deviance, was performed. This is a goodness-of-fit statistics commonly used for generalized linear models. The results of the regression were transformed to obtain probability scores using the formula: 1+ 1 !" with K being the results of the logistic regression. K wascomputedas = ! ∗ 1 + " ∗ 1 + # ∗ 1+⋯+ !" ∗ 3 . The coefficients of the prediction model are provided in Supplementary Table S2. The cross-validation and prediction model building were performed with R and the package glmnet 4.1-2. Based on the NKI-AVL training cohort, a threshold was chosen from the probability scores that were provided by the prediction model to classify a patient as predicted to achieve DC upon therapy. This threshold was set at the best sensitivity (detection of DC), while keeping a satisfactory specificity. A

RkJQdWJsaXNoZXIy MTk4NDMw