Joeky Senders

114 Chapter 6 Feasibility analysis To provide insight into the feasibility of text mining for various clinical characteristics, we calculated the correlation between model performance according to the AUC and the statistical properties of the variables of interest (i.e., frequency distribution and interrater agreement of the consensus label). Frequency distribution represents the percentage of observations in the least prevalent outcome group. For example, if a variable is present in 70% of the total cohort, the frequency distribution is represented by the minority group, 30%. As such, a frequency distribution of 50% reflects an equal distribution of observed values, whereas a distribution close to 0% reflects an unequal distribution. Interrater agreement was measured according to the Fleiss’ Kappa statistic, which is an extension of the Cohen’s Kappa statistic for more than two raters. 12 The Kappa statistic ( κ ) accounts for the possibility of agreement occurring by chance and is measured on a scale from -1 to 1. The interpretation of the κ can be categorized according to this scale as less than chance (<0), slight (0.01-0.20), fair (0.21- 0.40), moderate (0.41-0.60), substantial (0.61-0.80), and near perfect (0.81-1). 12 The association between model performance and the statistical properties of the variables of interest was measured according to the Spearman’s correlation. The NLP models were developed and evaluated in Python version 3.6 (Python Software Foundation, http://www.python.org) using the Scikit-learn library. The feasibility analysis was performed in R version 3.5.1 (R Core Team, Vienna, Austria, https://cran.r- project.org ). To promote the transparency and reproducibility of our work, we have released the source code with an open-source license on a publicly-accessible GitHub repository (https://github.com/jtsenders/nlp_glioblastoma ). Additionally, a step-by- step pseudocode is provided in Table 1, which can be used to develop similar NLP models for other clinical text mining applications. Results In total, we retrieved 562 unique brain MRI reports of glioblastoma patients operated at our institution. Prevalence of the radiological characteristics reported in the free-text radiology reports ranged between 10.5% for tumor extension into the corpus callosum and 53.7% for left-sided tumor involvement (Table 2).

RkJQdWJsaXNoZXIy ODAyMDc0