Joeky Senders

137 Comparing NLP methods Limitations Several limitations of the current study should be mentioned, which underline common barriers inNLP andmachine learningmodeling. Labels are necessary for the training and testing of algorithms, and human classification remains key for label generation in NLP tasks. Human classification, however, remains prone to error as well, which underlines the ambiguity of free-text clinical reports and the need for well-trained NLP models. In this study, a consensus in human classification was used as ground truth, which is a commonly used method to generate an approximation in the absence of actual ground truth. 28 Furthermore, this concept is already implemented in some frequently used machine learning algorithms, where the majority vote of many weak classifiers (e.g., decision tree) can result in a single strong classifier (e.g., random forest) referred to as ensemble learning. 29 In the current study, the complete data set was classified manually to generate labels for training and testing. However, when an NLP model will be put to use, only a minor portion will be labeled manually to predict the labels on the remaining data set. Due to the absence of labels in the remaining data set, external validation may not be feasible, and cross-validation remains the best approximation of model performance. Lastly, models trained on single institutional data might not generalize well to data from external institutions. Rather than deploying ready-to-use models, the current study therefore presents a framework for the development of NLP models that supports the overarching goal of automating the analysis of free-text clinical reports. Implications Medical jargon can be heterogenous in nature and expressed in various formats ranging from pathology and radiology reports to operative and discharge notes. This subset of unstructured data nonetheless follows a similar set of reporting norms, and thus statistical principles, which radically distinguishes this from human language used in newspapers, legal documents, or social media. 30 Although the current study focuses on brain metastasis patients and radiology reports, it can serve as a proof-of-concept for NLP of medical text. Therefore, the bag-of-words approach combined with a LASSO regression model may have a strong potential for NLP in other patient populations, clinical reports, and outcome measures as well. However, the nature of the NLP task of interest should align with the one used in the current study: extracting an equally- distributed, concrete binary outcome from free-text clinical reports. Within these boundaries, the presented NLP framework has the potential to facilitate retrospective clinical research by accelerating retrospective case identification and data extraction.