Carl Westin

56 First empirical insights MbC and MbE is clearly defined, the practical difference became subtle in our im- plementation. Under either condition, participants could take any action (i.e., reject or accept) during the time for which the resolution advisory was active. As such the two conditions appeared identical. A difference only emerged if the advisory (was allowed) to expire. While nothing happened at the expiry of a MbC advisory, the advisory would automatically be implemented at the expiry of a MbE advisory. To further differentiate the two, we discussed their layout and presentation. However, such measures only address appearance, which could introduce confounds, while failing to address differences in automation logic. While we acknowledge that LOA taxonomies can benefit theoretical discussions in human-automation interaction, we question the usefulness of these taxonomies for automation design. While a defined LOA may be applicable for very specific systems or functions, LOAs are not con- structive for defining whole system architectures and system authorities as these transverse a spectrum of levels depending on which aspect is considered. 3-5-2 Study limitations Given the nature of our experimental design, we must recognize a few potential confounds. For instance, and for practical and experimental reasons, manual trials always preceded automated trials. We can therefore not discount the possibility of history or order effects. Second, we must recognize the critical role that participant instructions play in such trials. In this case, participants were instructed that an ad- visory would always solve the conflict, but not necessarily in the most optimal way, and were therefor encouraged to find their own preferred solutions. Whereas these (or any) instructions might have imparted some unavoidable bias (toward either us- ing or disusing automation), there is no reason to think this should have confounded our comparison of conformance conditions. We must recognize that the difficulty rating data refer to each two minute session as a whole, and therefore do not al- low clear conclusions to be drawn about the short advisory interval itself. Again, this is not so much a shortcoming of the experimental design, as a reminder that difficulty ratings were intended as a proxy measure of our complexity manipula- tion. Moreover, design parameters used for baseline scenarios and their associated (designed) conflicts have likely influence the data collected. For instance, all base- line scenarios contained a biased conflict, meaning that some solutions could have been determined more optimal (e.g., in terms of shortest deviation) than other be- cause of the geometrical relationship between conflicting aircraft. However, we recognize that the determination of an optimal solution is highly dependent on the criterion underlying such judgments. Finally, although we created two high com- plexity scenarios, and two low complexity scenarios, we combined them as single measurement references in subsequent analysis.