Marga Hoogendoorn

35 Reliability Information about the inter- or intra-rater reliability was reported for 12 out of 27 (44%) scoring systems. A summary of the results of the reliability of the scoring systems is presented in Table 2. For 10 systems (37%) the inter-rater reliability and the intra-rater reliability were considered substantial to almost perfect (Cronbach’s alpha 0.71 – 1.00, Kappa > 0.65). The results of the remaining 2 systems (PINI and NAS) showed varying results from slight to substantial agreement 21, 42-45 . The interventions which include categories of a subjective estimation of time by the nurse (e.g. the hygienic procedures took more than 2 hours per shift in NAS) showed lower reliability (Kappa of 0.02-0.12) 45 . Validity Information about the validity was reported for 24 of the 27 (89%) scoring systems. A summary of the results of the validity of the scoring systems is presented in Table 2. The ‘gold standard’, observed time-measurement, was used in only 7 (26%) scoring systems. Although the TISS was originally (in 1974) developed without the use of continuous time-measurements, we found one study, published in1992, in which the TISS was retrospectively evaluated using continuous time-measurements 46 . A strong correlation was shown between the time for nursing interventions and the TISS-76 (r=0.89, p<0.0001). The Classification System of the Jackson Memorial Medical Centre was developed and evaluated with continuous time-observations. It was concluded that the point-system was a good indicator of the actual care received 47 . The PINI was validated with an observational time measurement study 42 . A strong correlation was found between the observed time and the rated hours of care (r=0.75, p<0.001). In 70% of the disagreements, nurses overestimated the hours of care. The NAS was validated with Multi Moment Recordings; 81% of the total time spent by nurses was explained by the NAS 33 . The NWL- Patient Category Scoring System was validated by comparing the results of the scoring system with time-measurements by video-observation. They concluded that this scoring system did not give an accurate reflection of the amount of nursing time 34 . The system described by Evans et al (No name) was validated with time-observations; the expected needed hours per shift was compared with the observed hours per shift per category 37 . They concluded that the expected and observed nursing care hours were equal, except for category II patients. This category expected 8 hours nursing care per shift where 5.3 hours nursing care were observed. The weaker method for validation, i.e. comparing the newly developed scoring system with an existing scoring system, was described for 16 scoring systems (59%). As we can see in table 2, most studies (n=10) used the TISS for this comparison. One study used case- vignettes for the evaluation of the validity 66 .