Joeky Senders

110 Chapter 6 Abstract Introduction The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from narratively-written reports. Additionally, we aimed to provide insight into the eligibility of variables and the methodological boundaries of text mining in clinical research. Methods Various NLP models were developed to extract 15 radiological characteristics from free-text radiology reports of glioblastoma patients. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. The Spearman’s correlation was calculated to examine how model performance (AUC) was associated with the frequency distribution of the variables of interest and the interrater agreement of the manually provided labels. Results In total, 562 unique brain MRI reports were retrieved. NLP extracted 15 radiological characteristics with high to excellent discrimination (AUC 0.82-0.98) and accuracy (78.6-96.6%). Model performance was correlated with the interrater agreement of the manually provided labels (rho=0.904, p<0.001) but not with the frequency distribution of the variables of interest (rho=0.179, p=0.52). All variables labelled with a near perfect interrater agreement were classified with excellent performance (AUC>0.95). Excellent performance could even be achieved for variables with merely 50-100 observations in the minority group and class imbalances up to a 9:1 ratio. Conclusion This study provides an open-source NLP pipeline that allows for text mining of narratively-written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of interrater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.