Hester van Eeren

| Chapter 5 5 | 112 which the comparative effectiveness was studied — consisted of adolescents with overall less risk factors (i.e., less reported court orders, see Table I and II in Supplemental Material) compared to the group for which no data was available after treatment, which could in turn result in less differences between interventions because this group might have shown better results overall. And thus, though clinical practice data were used, the findings can only be generalized to the selected group of adolescents and the findings should be interpreted in light of this sample selection. On the one hand, this study sample is likely larger and has less sample selection bias compared to data from randomized clinical trials (RCTs). But using observational data still merits reflection on the generalizability of the findings and evaluation given the selections, regardless of the study design (Stuart, Cole, Bradshaw, & Leaf, 2011). Furthermore, partial replication of a previous study (Baglivio et al., 2014) supports prior evidence and shows that the results are robust across different clinical settings and study designs (Duncan, Engel, Claessens, & Dowsett, 2014). Despite the clinical relevance and use of this study, some limitations merit reflection. First, although a wide range of initial differences between adolescents in the treatment arms were controlled for, there could still be differences that were unmeasured and thus not controlled for. For example, the quality of life of the adolescent was not measured. This could have led to hidden biases in the presented results (Rosenbaum, 1991; Shadish, 2013). Second, though a response rate of ~40% is common when using clinical practice data from ROM in the Netherlands and not gathered for specific research purposes, there were a number of families who did not complete the CBCL at the end of the treatment. When comparing adolescents who did and did not complete this primary outcome measure, there were differences within the MST and FFT group. As a result, the external validity of this study is not optimal because the effect of the treatments in the group with missing data could not be measured. Third, the interventions are monitored in a quality system, follow detailed protocols, and require therapists to have completed higher education in a relevant domain. Differences between interventions, however, could be related to the duration of the treatment, the dosage and intensity of the interventions, and adherence of therapists to treatment protocol. Because the duration and intensity of treatment depend on the particular situation of an adolescent assigned to MST and FFT which could be related to specific background characteristics of the adolescent and the family, controlling for these factors would not fully represent the services as provided. Even more, it is yet unclear how the intensity of treatment can be defined. It could, for example, be related to the number of sessions, the amount of time, directly and indirectly, given to an adolescent and his or her family, and the length of treatment. Fourth, we had not data on adolescents assigned to treatment as usual or a control group consisting of adolescents not receiving treatment. However, when decision makers decided on the use of these interventions, it would have been helpful to include a reference treatment option. Fifth, though the chosen method was thoroughly considered, and all assumptions checked, and although results were robust over different samples (the study sample and the complete case sample), the choice of methods could influence the outcomes. There could, for example, be different estimation methods, e.g., matching with the PS or stratification using the PS, which