Elise Neppelenbroek

39 Observer agreement of aCTG assessments Re 4: Sample size The calculation of the sample size for this study was a real challenge. When using the standard sample size calculations for independent proportions and aiming at a difference between, for example, 0.85 and 0.95, using the formula n = (Zα/2+Zβ) 2 * (p1(1-p1)+p2(1-p2)) / (p1-p2) 2, the numbers needed are 138 in each group.3 This is impossible to organize and find raters for so many samples. For reliability studies, other recommendations are generally used. Most sample size requirements have been issued for continuous outcomes; the paper by Kottner et al. provides some references.4 De Vet et al. calculated sample sizes based on the formula presented by Giraudeau and Mary.5 These calculations are based on the required 95% CI. Table A1: Required sample size for ICC 0.7 and 0.8 for 2-6 repeated measurements(6) ICC = 0.7 ICC = 0.8 m repeated measurements 95% CI ± 0.1 n 95% CI ± 0.2 n m repeated measurements 95% CI ± 0.1 n 95% CI ± 0.2 n 2 100 25 2 50 13 3 67 17 3 35 9 4 56 14 4 30 8 5 50 13 5 28 7 6 47 12 6 26 7 Table A1 gives an indication of sample sizes and shows that there is a trade-off between raters and samples (patients or objects to be rated). An adequate selection of raters (preferably more than 2-3) is important to increase the representativeness of the raters. These considerations led to our choice of 20 CTGs to be rated by 5 raters from each professional group. Ten CTGs to be rated by each person seems feasible. By rating too many CTGs, the concentration would wane and affect the results. Therefore, we choose to work with two sets of 10 CTGs, thereby increasing the representativeness of CTGs. The guidelines for reliability studies, i.e., GRRAS (Kottner et al.) and QAREL (Lucas et al.), hardly focus on sample sizes.4, 7 Other authors also emphasize that the selection of the samples to be rated and the number of raters are more important than the sample size.8 2

RkJQdWJsaXNoZXIy MTk4NDMw