Aernoud Fiolet

322 Chapter 13 Table 2. Distributions and accuracy of baseline variables automatically collected from EHR data compared to trial data. On an individual participant level, automated EHR text-mining data showed 88.0% accuracy (median; IQR 84.7–92.8%) when compared to the conventionally collected trial (Table 2; center-specific accuracy is presented in Supplement 2b). Overall, 9.8% of the data extracted from EHRs were false positive (i.e., data on a variable present in EHR data and not present in trial data), and 3.1% false negative (i.e., data on a variable not present in EHR data and present in trial data) (Table 3; for contingency tables of different medical centers see Supplement 2c). Of all data points, positive predictive value was 0.928, negative predictive value was 0.937, sensitivity was 0.806, specificity was 0.827, and F1-score was 0.863 (for test performance scores of individual variables, see Supplement 2d). The lowest accuracies were found for hypertension (62.6%), antiplatelet therapy (68.8%), and beta-blocker use (73.3%). Accuracies for hypertension, antiplatelet therapy, and