Ridderprint

Discussion 153 7.3.4 Optimizing HRM through Machine Learning In essence, everything that happens in an organization, including HRM, has the potential to represent a learning experience. Whether we are dealing with recruitment and selection, with training and development, with rewards and performance, or with talent management and retention – the behavior of our employees can inform us to what extent our HRM processes are successful, or not. Here, people analytics helps us to improve the ways in which we measure and monitor these processes and their outcomes; to discern patterns in the large amounts of data we collect; and to actually learn by the numbers and improve our HRM decisions. The wider field of machine learning is particularly experienced with learning through data and has much to offer the HRM domain and people analytics. The next section discusses the value of cross-validation and the exploration-exploitation tradeoff. 7.3.4.1 Generalization via Cross-Validation In many applied domains, such as HRM and people analytics, achieving accurate predictions is often a primary goal of research initiatives (Yarkoni & Westfall, 2017). For instance, we might want to examine to what extent applicants’ characteristics predict whether they will be high-performing employees. However, conventional HRM research rarely verifies that the explanatory models they propose are capable of predicting the outcomes they are modeling. Moreover, from a statistical standpoint, it is rarely true that the model which best explains the sampled data at hand will provide the best predictions for outcomes in the real-world (Shmeuli, 2010; Yarkoni & Westfall, 2017). Too often, our HRM models will be overfitting the process at hand – mistaking sample-specific noise for relevant patterns – and will therefore not generalize well to new observations (Yarkoni & Westfall, 2017). Machine learning scholars understand the importance of evaluating the predictive power of models and commonly do so by cross-validation: a family of techniques that involve training and testing models on different subsets of the sampled data (Breiman, 2001; Browne, 2000; Friedman et al., 2001). Following the standard procedure, we would train our statistical model on a random part of our dataset, and then assess (i.e., test ) how accurate this model predicts the outcomes in the other part of our dataset. Although the information of the second, test sample goes to waste in this simplified example (i.e., does not help to train the model), smart approaches such as k-fold cross-validation effectively recycle training and testing data in order to leverage all information. Cross-validation is rarely used to assess model performance in conventional management and psychology research. Nevertheless, the practice has deep roots in the field, for instance, in the form of classical replication research (Yarkoni & Westfall, 2017, p. 1110). Cross-validation techniques can, to some extent, assure that our HRM models not only fit the patterns in our current sample well, but also generalize to a wider context. It provides quantitative information on how well our model explains and predicts outcomes in- and out-of-sample. On the one hand, such cross-validation is important in preventing the reproduction crises faced in related domains (Open Science Collaboration, 2015), where models that were regarded as good explanations in an initial sample fail to accurately explain and predict the same outcome in future samples. Cross-validation