Klaske van Sluis

88 5.4. Discussion not reveal any systematic differences between time periods that could be at- tributed to these equipment differences. For future research, we are collecting recordings in a consistent setting. The anatomical and physiological changes in voice production, which TL patients are facing, are immense. In tracheoesophageal speech, voice is produced by the PES that originally does not have a function in sound production. Some tracheoesophageal speakers present a fairly good voice, whilst others are rated as more deviant in voice quality and intelligibility. The differences between recordings vary. On average, a slight decrease over time is seen in perceptually rated voice quality and intelligibility (Figure 5.2). This might indicate an effect of aging. The perceptual evaluations tend to be scattered between the expert raters (Figure 5.1). In the literature it is stated that expert raters such as SLP’s provide more reliable outcomes than naïve listeners. To assess the consistency of the raters, for one speaker three recordings were evaluated. It appears that the experts can judge the speech quite consistently (Table 5.3). Using pairwise comparisons, as in experiment 2, is more sensitive to differences. Pairwise com- parison results in more consistent ratings than rating individual samples, as in experiment 1. Changes in voice quality and intelligibility are dependable within individ- ual speakers. When voice quality is rated as good by perceptual evaluation, intelligibility tends to be as well. The strong correlation ( R =0.99, p < .001 in experiment 2) between these outcome measures confirms this dependency. The fact that independent automatic measures, AVQI and ELIS, are also correlated shows that this correlation is part of the speech signal itself. These (high) cor- relations indicate that intelligibility problems with TE substitute voices might emerge from a lower perceptual voice quality. The AVQI was developed for analyzing a combination of sustained vowels and running speech samples [17]. There were no sustained vowel recordings for some of our speakers. Therefore, AVQI analysis was partially performed, i.e., on running speech only. Our results show that this procedure already provided sufficient information (c.f. [20]). The AVQI scores correlate strongly with voice Table 5.3: Results in Experiment 1 and 2 for speaker KRH. Intell. : Intelligi- bility, VQ: Voice Quality. ∗ : p < .01 with other periods. : p < .004. See text Experiment 1 Exp. 2 Period I II III I-II II-III Intell. 801 739 731 -138 -66 ELIS 620 726 581 106 -145 VQ ∗ 690 443 461 -208 -97 AVQI 474 334 409 -140 75