a) Subjective ratings

In general, ratings were low across all time points for Q1 (i.e., “I felt as if the voice I heard was my own voice”), but high before and low after the Mismatch stage for Q2 (i.e., “I felt as if the voice I heard was a modified version of my own voice”) (see Figure 3).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. The box plots for ratings on the two questions (Q1 and Q2) are shown for the Early Mismatch and Late Mismatch group. a) Question 1: I felt as if the voice I heard was my own voice, and b) Question 2: I felt as if the voice I heard was a modified version of my own voice, were rated on a 7-point Likert scale across five time points. The Mismatch stage occurs after the first time point for the Early Mismatch group and after the fourth time point for the Late Mismatch group, as indicated by a red vertical dashed line. https://doi.org/10.1371/journal.pone.0018655.g003

These observations were confirmed with MANOVAs on the ratings on each of the two questions across the five time points, with stimulus voice (V1 and V2) and group (Early Mismatch and Late Mismatch) as between-subjects factors. For Q1, a pattern of results consistent with the mismatch events having a marked effect on ratings was rather weakly observed. There was a marginally significant interaction between time and group F(4, 55) = 2.52, p = .052, η p 2 = .16, with ratings being higher in the Early Mismatch group than in the Late Mismatch group at both first and last time points, p = .001 and p = .002, respectively. However, ratings dropped significantly after the Mismatch only in the Early Mismatch group, p = .004.

In addition, we observed a higher overall rating in the Early Mismatch compared to the Late Mismatch group, F(1, 58) = 8.56, p = .005, η p 2 = .13, and a marginally significant effect of time, F(4, 55) = 2.47, p = .055, η p 2 = .15. A trend analysis indicated that there was a cubic trend of ratings across the time points, F(1, 58) = 8.04, p = .006, η p 2 = .12, such that ratings decreased after the first time point, and then started to gradually increase, before decreasing again at the last time point.

Sign tests on Q1 ratings at each of the time points indicated that for the Early Mismatch group, the ratings at the first and last time points were not different from ‘neutral’ (i.e., a rating of ‘4’), p ≥.061, but below ‘neutral’ at time points 2, 3, and 4, p≤.008. For the Last Mismatch group, the ratings were below ‘neutral’ at all time points, p≤.001.

The three-factor MANOVA on Q2 ratings revealed a strong interaction between time and group F(4, 55) = 25.26, p<.001, η p 2 = .65. Follow-up pairwise comparisons with Bonferroni correction revealed that, for the Early Mismatch group, Q2 ratings at time point 1 (pre-Mismatch) were significantly higher than those at all the later time points (post-Mismatch), p<.001. In this group, Q2 rating at time point 2 was also lower than that at time point 4, p = .003. For the Late Mismatch group, Q2 ratings at the first four time points (pre-Mismatch) were all significantly higher than that at the last time point (post-Mismatch), p<.001. The ratings of the two groups did not differ at either the first or last time point (i.e., before the mismatch event for both groups or after it, p≥.100) but the ratings from the Late Mismatch group were significantly higher than those of the Early Mismatch group at time points 2, 3, and 4, p≤.001, which are pre-Mismatch stage for the Late Mismatch group but post-Mismatch for the Early group (see Figure 3b).

In addition to this expected interaction, participants who heard V1 gave higher ratings that those who heard V2, F(1, 58) = 5.63, p = .021, η p 2 = .09, and participants in the Late Mismatch group gave higher ratings than the Early Mismatch group, F(1, 58) = 9.81, p = .003, η p 2 = .15. Finally, ratings varied across the time points, F(4, 55) = 34.39, p<.001, η p 2 = .71. A trend analysis revealed a combination of linear, F(1, 58) = 33.43, p<.001, η p 2 = .37, and cubic, F(1, 58) = 88.65, p<.001, η p 2 = .60, components for ratings across the time points, such that ratings dropped after the first time point, and then slowly increased from time point 2 to 4, before dropping again at the last time point.

Sign tests on Q2 ratings indicated that, for the Early Mismatch group, Q2 ratings at the first time point (pre-Mismatch) were reliably greater than ‘neutral’, p<.001, but ratings dropped to well below ‘neutral’ at time point 2, p = .001 (post-Mismatch) and then were not different from ‘neutral’ for time points 3, 4, and 5, p≥.458. For the Late Mismatch group, Q2 ratings at the four pre-Mismatch time points were all reliably greater than ‘neutral’, p≤.001, whereas the rating at the post-Mismatch time point dropped to well below ‘neutral’, p = .014.

The results, particularly from Q2, suggest that the Mismatch stage, characterized by incongruent stimulus voice feedback, appeared to disrupt the illusion of the stimulus voice being attributed to the ‘self’, as evidenced by altered ratings. Higher Q2 ratings at the later time points for the Early Mismatch group (see Figure 3b) may indicate that after many further trials of congruent feedback, the illusory percept appeared to build again. Overall, it seems that the perceptual illusion regarding the perceived identity of the feedback voice is elicited by congruent feedback, matched in timing and content to the participant’s own vocalization.