56 CHAPTER 3 ABSTRACT Background Recognizing and addressing vulnerability during the first thousand days of life can prevent health inequities. It is necessary to determine the best data for predicting multidimensional vulnerability (i.e. risk factors to vulnerability across different domains and a lack of protective factors) at population-level to understand national prevalence and trends. This study aimed to 1) assess the feasibility of predicting multidimensional vulnerability during pregnancy using routinely collected data, 2) explore potential improvement of these predictions by adding self-reported data on health, wellbeing and lifestyle, and 3) identify the most relevant predictors. Methods The study was conducted using Dutch nationwide routinely collected data and selfreported Public Health Monitor data. First, to predict multidimensional vulnerability using routinely collected data, we used Random Forest (RF) and considered the Area Under the Curve (AUC) and F1-measure to assess RF-model performance. To validate results, sensitivity analyses (XGBoost and Lasso) were done. Second, we gradually added selfreported data to predictions. Third, we explored the RF-model’s variable importance. Results The initial RF-model could distinguish between those with and without multidimensional vulnerability (AUC 0.98). The model was able to correctly predict multidimensional vulnerability in most cases, but there was also misclassification (F1-measure 0.70). Adding self-reported data improved RF-model performance (e.g. F1-measure 0.80 after adding perceived health). The strongest predictors concerned self-reported health, socioeconomic characteristics and healthcare expenditures and utilization. Conclusions It seems possible to predict multidimensional vulnerability using routinely collected data that is readily available. However, adding self-reported data can improve predictions.
RkJQdWJsaXNoZXIy MTk4NDMw