Joeky Senders

119 Automating clinical chart review accessible. 33–35 This constitutes a significant loss of potential as open-source coding allows for transparency, reproducibility, and external generalizability of the developed NLP pipelines. 36 Furthermore, model performance is often presented as the main finding in the current medical literature, whereas the question why certain variables are more suitable for text mining remains relatively unexposed. FIGURE 2. Scatterplots depicting the correlation between model performance and statistical properties of the variables of interest. Each of the 15 radiological characteristics is represented by a point on these scatter plots. On the y-axis, the performance of the NLP models developed to extract these variables was mapped and measured according to the AUC. On the x-axis, the frequency distribution of the variables (A) and the interrater agreement of the manually provided labels (B) were mapped. The frequency distribution represents the percentage of observations in the least prevalent group, and the interrater agreement was calculated by means of the Fleiss’ Kappa statistic (κ). The association between these statistical properties and model performance was calculated by means of the Spearman’s correlation. The smoothed line depicts a Local Polynomial Regression Fitting and the ribbon the associated standard deviation. In the current sample, model performance was statistically significantly correlated with the strength of the interrater agreement of the manually provided labels (rho=0.904, p<0.001), but not with the equivalence of the frequency distribution of these variables (rho=0.179, p=0.52). Abbreviations: AUC=area under receiver operating characteristics curve; κ=Fleiss’ Kappa statistic; %=percentage of patients in the minority group. Implications In the current study, we have developed an open-source NLP pipeline for text mining of medical information using a corpus of free-text radiology reports of patients with a glioblastoma. This pipeline can guide the development of NLP models for other patient cohorts, medical reports, or clinical characteristics as well. Automated extraction of medical information could accelerate the speed and scale at which retrospective chart

RkJQdWJsaXNoZXIy ODAyMDc0