2 CHAPTER 2 34 Statistical analyses and data processing Student’s t-test and Chi-square tests were used to compare demographic characteristics between cases and controls. Raw 1H-NMR signal data were processed as follows. Values below [mean - 4 * SD] and above [mean + 4 * SD] were filtered out. Then normality was assessed and data were log10-transformed when necessary, using SPSS software version 20.0 (SPSS Inc., IBM, Armonk, NY, USA). Signal data was adjusted for kinship by linear regression in GenABEL version 1.70, using R version 2.14.2 (R Foundation for Statistical Computing, Vienna, Austria).50 Finally, the residuals from this linear regression model were transformed into Z-scores to approximate normality using SPSS software version 25.0 (SPSS Inc., IBM, Armonk, NY, USA). To reduce the dimensionality of the data and due to possible correlations between the parameters, elastic net regression was used to select a subset of the most informative signals for: (1) lifetime migraine diagnosis, and (2) a diagnosis of active migraine (defined as having at least one severe migraine in the last 12 months). Of note, patients likely had many attacks in the last year as is typical in migraine patients when they still have migraines, but data are lacking to assess how many attacks they had and when the last attack was before blood withdrawal nor do we know whether they were on medication. Hence we consider our migraine cases a sample with “real-life variation” with respect to attack frequency and severity. The R package glmnet was used with alpha set to 0.5 and 50-fold cross-validation using R software version 3.6.1.51 In this cross-validation step we validated the selection of the signals by performing our regression analysis on 50 randomly chosen samples of our study population. Elastic net regression reduces variance and error and increases bias and the predictive power, which leads to better long-term prediction. However, the inferential capability decreases, which makes interpretation difficult as there are no uncertainties in terms of confidence intervals or hypothesis testing. In an attempt to interpret our findings, we performed subsequent regression models. Because we had to perform the regression models within the unique cohort the exact p-values of these models are no longer valid, although the results may provide at least some information whether metabolites may be involved. For the regression models we entered the metabolites of the metabolic profiles in a logistic regression model to determine the weights for each signal for this population. The linear predictor of the logistic regression model was used as a “weighted metabolite score” (sum of regression coefficients multiplied by the corresponding covariate values). This score was used in a second logistic regression analysis to calculate odds ratios (ORs), p-values and the proportion of explained variance. To determine whether we had to correct our logistic regression model we independently assessed the influence of sex, age, body mass index (BMI) and smoking status on the “weighted metabolite score”, by visually inspecting stratification plots and performing a linear model, where the “weighted metabolite score” was modelled as a function of migraine status. We included age, sex, BMI and current smoking status as covariates in the logistic regression model. To validate the findings from the previous analysis we performed analysis of variance (ANOVA) in which we compared the performance of the full model with the identified scores for migraine with the performance of a model containing only information on age, sex, BMI and smoking.
RkJQdWJsaXNoZXIy MTk4NDMw