Mia Thomaidou

36 methods are standardized, validated, and well described, if potential confounders were considered, and adequate blinding. Each category was scored as being satisfied (0 points), not satisfied (2 points), partially satisfied (unclear; 1 point), or not applicable. Scores were selected based on criteria described in Marcuzzi and colleagues 23. We additionally concocted numerical scores (0-34) for each study, by summing each item score, with higher scores indicating higher risk of bias (please see Appendix C for an example of the RoB scoring. Statistical analyses and results synthesis All analyses were conducted and checked by two reviewers (J.S.B. and M.A.T), using the Comprehensive Meta-Analysis software (version 3.3.070; Biostat, Englewood, USA) and R programming software for visualizations 24. Funnel plots were inspected for outliers (i.e., studies falling outside the funnel of expected results), and to assess publication bias across studies we checked for number of imputed missing studies with Duval and Tweedie’s trim and fill method 25. Heterogeneity between studies was assessed with the I2 statistic and visual inspection of the forest plot. I2 is a measure of the proportion of observed variance reflecting real differences in effect sizes 26 with values of 25%, 50%, and 75% considered as low, moderate, and high degrees of heterogeneity, respectively 27. For forests plots, we calculated study weights in R, by inversing the variance of each effect size. Given the heterogeneity of study designs, random effects models were used for all meta-analyses. Effect sizes were calculated using means and standard deviations for each group (between subjects) or trial type (within subjects). 26. We selected nocebo and control conditions based on what was reported in studies: some reported nocebo magnitudes between groups, other within groups in the first pair of evocation trials,

RkJQdWJsaXNoZXIy MTk4NDMw