177 Classifying study participants in clinical trials concordance 79%; κ 0.77, 95% CI 0.60–0.95). The most prevalent additional diagnostic category was extrathoracic pathology (101/187, concordance 67%; κ 0.59, 95% CI 0.46– 0.71). Direct referral to the expert panel was selected 89 times of which 50 selections resulted in 25 agreement cases (concordance 56%). Classification by and inter-observer agreement between members of the expert panel Classification by the expert panel in the 60 validation cases resulted in agreement between internist, pulmonologist and cardiologist (if necessary) in 34 cases (concordance 57%, 95% CI 44–69%). In 24 cases (40%) there was total agreement, in 23 cases (38%) partial agreement, and in 13 cases (22%) disagreement on the classification. Further qualitative evaluation of the 36 partial and disagreement cases revealed that 10 cases were due to discordance on labels from the additional diagnostic categories only or procedural errors, leaving 26 cases as “true” disagreement. Specifics can be found in Supplementary Table S2. A total of 173 labels were assigned to the 60 validation cases by the expert panel: 119 definite diagnostic labels (concordance 86%) and 57 labels from the additional six diagnostic categories (concordance 70%). Specifics and κ values can be found in Supplementary Table S3. Discussion We tested a method for post-hoc classification of study participants in large scale radiology trials in a study comparing chest x-ray with ultra-low-dose chest CT. The students and, if necessary, residents were able to assign a diagnosis in 76% of cases with a suspicion of pulmonary disease. Comparing the classification of 60 patients by medical students and residents against classification of the same patients by a panel of medical specialists resulted in agreement on the clinical diagnosis for 50 of the 60 patients (83% concordance, 95% CI 74–93%). When discrepancies were studied in detail, students classified in particular more less-severe diagnoses, such as URTI, that the medical specialist put aside. The use of a composite reference is a common method for disease classification in large clinical trials. As an example, in a diagnostic accuracy study evaluating imaging strategies for the detection of urgent conditions in patients with acute abdominal pain, a final diagnosis was assigned by an expert panel of two gastrointestinal surgeons and an abdominal radiologist [13]. Laméris et al. described the general methods for diagnosis assignment and listed the panel members in their appendix. Specifics on how the panel was instructed, blinding of members, measures of agreement and the process of the consensus meeting are not provided in the main study report. Word count limits imposed by journals complicate full and informative reporting of such essential issues, and as a result methods to achieve panel-based consensus are often not described in studies, precluding reproducibility, and guidance on preferred methodology is lacking [3, 14, 15]. Considering panel-based consensus methods for a trial design might also be discouraged by the time-consuming process of panel-based diagnosis [10, 11]. If the methodology of panel-based diagnosis ís described, agreement regarding diagnosis assignment varies. For instance, Klein Klouwenberg et al. studied the inter8
RkJQdWJsaXNoZXIy MTk4NDMw