Ethics statement

The STEM Ethics Committee of the University of Birmingham approved the study and all experimental protocols. The methods were carried out in accordance with approved guidelines.

Participants

In total 90 undergraduate students participated in the study with an average age of 20.83 (SD: 2.20). For Experiment 1, 15 students participated in the auditory experiment (10 females, M age = 21.07, SD age = 1.87) and 15 in the visual experiment (9 females, M age = 20.27, SD age = 1.83); Experiment 2 involved 12 participants (10 females, M age = 20.67, SD age = 2.50); Experiment 3 involved 24 participants (18 females, M age = 21.17, SD age = 2.53); and Experiment 4 involved 24 participants (16 females, M age = 20.67, SD age = 2.16). All participants gave informed consent prior to the experiment and they were either compensated £6 per hour or given course credits. All reported normal or corrected-to-normal hearing and vision and they were all naïve to the purpose of the experiment.

Experimental setup

Participants sat in a quiet, well-lit room at a distance approximately 50 cm from the light- and sound-producing apparatus. A red 5 mm LED positioned in front of the participant (20 ms with 5 ms linear ramp, 91 Cd/m2) produced visual stimuli. A speaker 50 cm to the left of the participant (20 ms with 5 ms linear ramp, 1 kHz, 75.1 dBA) produced audio stimuli. A computer audio card connected to two identical audio amplifiers generated signals, all of which were loaded onto the audio card before the trial started to ensure accurate timing.

Psychophysical procedures

Experiment 1 – Isochrony judgments

The aim of Experiment 1 was to test whether there is an increase in sensitivity to temporal deviations as a function of how many stimuli there are in a sequence. Fifteen participants took part in the audio experiment and another 15 in the visual experiment. Sequences of three, four, five, or six unimodal stimuli (either audio or visual) were presented with a regular inter-onset interval (IOI) of 700 ms, except the last stimulus, which had a deviation of 0, ±20, ±40, ±60, ±80, ±100, ±150, or ±200 ms. Each trial type was repeated eight times. The participant’s task was to report whether the last stimulus appeared to be regular or not with the rest of the isochronous sequence. Participants responded by pressing one of two keys and the next stimulus would appear 1.5 to 2 s after the keys had been released. For each participant, we computed the proportion of responses for each anisochrony and sequence length. Individual trials for different conditions were randomly interleaved in all experiments.

Experiment 2 – Audiovisual temporal order judgments

The goal of Experiment 2 was to understand whether the anisochrony at which a stimulus is presented affected the perceived timing of a stimulus in a sequence. Participants completed the experiment in two phases: the practice phase and test phase. The goal of the practice phase was to familiarize participants with the audiovisual temporal order judgment (TOJ) task, assess performance and provide baseline data for the creation of the Bayesian models. Participants were presented with a single audiovisual stimulus pair separated by a stimulus-onset asynchrony (SOA) of 0, ±20, ±90, ±170, ±250, or ±350 ms. Each SOA was repeated six times, totaling 66 trials. The participant’s task was to report whether the audio or visual stimulus appeared first in time. Participants responded by pressing one of two keys and the next stimulus would appear 1.5 to 2 s after they had been released.

During the test phase, participants were presented with a unimodal (either audio or visual) sequence of four stimuli having an IOI of 700 ms, except the last stimulus, which deviated by either 0, ±40, ±80 ms. The last stimulus in the sequence was presented together with a stimulus in the other modality (e.g., a visual stimulus paired with a sequence of sound stimuli) with an SOA of 0, ±40, ±80, ±120, or ±200 ms with respect to the anisochrony of the last stimulus presented. Each trial type was repeated eight times. The participant’s task was to report which of the two stimuli presented at the fourth point in time appeared first, i.e., audio first or visual first. Participants responded by pressing one of two keys and the next stimulus would appear 1.5 to 2 s after they had been released (a review on TOJs is provided here59).

For each participant, we computed the proportion of responses for each presented SOA. Of particular interest to our hypotheses was the point of subjective simultaneity (PSS): the SOA at which an individual participant was equally likely to respond that either of the two stimuli was first. Positive PSS values mean that the light had to be presented before the sound to be perceived as synchronous and negative values indicate that the sound had to be presented before the light for perceived synchrony. Changes in PSS as a function of anisochrony indicate a modification of the perceived timing of stimuli due to expectation. Also of interest was the just-noticeable difference (JND), the asynchrony necessary so that participants report the correct order of the stimuli at a proportion of .84 (which corresponds to 2σ). The PSS and JND were estimated as the first and second moments of the distribution underlying the psychometric function by using the Spearman-Kärber method87. This method provides non-parametric estimates that avoid assumptions about the distributions underlying the psychometric functions. A mathematical derivation of the method follows. First we define SOA i with i = {1, … 15} as the 15 values of audiovisual SOA used in the experiments and p i with i = {1, … 15} as the associated proportion of “light first” responses. We further set two SOAs outside of the range tested, SOA 0 = −250 ms, SOA 16 = + 250 ms, to be able to compute the intermediate SOA between two successive ones

We then define two associated proportions to these extreme SOAs p 0 = 0 and p 16 = 1 and we calculated the associated values of the difference in proportion

With these indexes we can express PSS and JND analytically as such:

and

We used values of PSS and JND in the test phase of the experiment to assess participant performance. If JND was below 200 ms and if PSS did not exceed ± 175 ms, participants performed one of the experiments below. We used test-phase data to determine the likelihood distribution parameters of both the symmetric and asymmetric Bayesian models (detailed below) so this simple TOJ task was not biased by temporal expectations and thus reflected likelihood probabilities alone.

Experiment 3 – Number of stimuli in a sequence

Experiment 3 was aimed at measuring whether the changes in PSS found in Experiment 2 increase as a function of the number of stimuli in a sequence. Only one sequence length was presented in each of four blocks (the order was counterbalanced across participants). Sequences of three, four, or five audio stimuli were presented with an IOI of 700 ms, except the last stimulus, which had a deviation of 0 ms or ±40 ms. The last stimulus was presented together with a visual stimulus with an SOA of 0, ±40, ±80, ±120, or ±200 ms. Each trial type was presented 12 times.

Experiment 4 – Sequences with different periods

The goal of Experiment 4 was to check whether changes in PSS still occur if sequences don’t have the exact same period. Four types of audio sequences were presented with an IOI of 400, 700, or 1000 ms, except the last stimulus, which had a deviation of ±40 ms. The last stimulus was presented together with a visual stimulus with an SOA of 0, ±40, ±80, ±120, or ±200 ms. Each trial type was presented 12 times.

Model fit and predictions

Interval-based model

It has been suggested that the precision of a duration estimate improves when multiple estimates are obtained from a sequence of stimuli. The perceptual system is hypothesized to be capable of averaging duration estimates in a statistically optimal fashion19. The multiple look model expands this analysis by quantifying the discrimination performance with two sequences of isochronous intervals and allowing for the differential contribution of the two sequences to the judgment20,21. We adapted the formula of the multiple look model to the conditions of Experiment 1 (for a derivation see22) so that we could estimate the JND obtained with intervals of N = {3, 4, 5} ( ) from the individual subject’s value of JND with the sequence of two intervals (JND 2 ) according to:

The weight parameter l was tuned by minimizing the sum of the squared differences between the observed data in Experiment 1 and the model for the audio and visual modalities. As such, the l parameter was 0.964 for audio and 0.958 vision. Predicted were used as parameters of Gaussian distributions of the responses (the maximum point of the curves was normalized to 1 for better comparison across the models). The mean response distributions across participants for each sequence length are shown in Fig. 6A. We then calculate JND by substituting the proportion of “regular” responses to the term dp in Equation 4. Interval-based models predict no changes in perceived timing of stimuli, leading to constant PSS values as a function of anisochrony. To quantitatively compare such predictions to our data, we found the sum of the squared error between a PSS of 0 for all conditions and the empirical data (Fig. 6A).

Entrainment model

We implemented the entrainment model for perceived temporal regularities23 and simulated 1000 sequences for each of the temporal deviations and sequence lengths used in Experiment 1. The probability distribution that simulates the results of Experiment 1 is shown in Fig. 6B (maximum point normalized to 1).

Entrainment models do not make explicit predictions about changes in the perceived timing of stimuli, but only on the amount of attention devoted at each point in time. To relate entrained attention to perceptual acceleration, we hypothesized a prior-entry effect35 that is proportional to the magnitude of the attentional pulse at the time the stimulus is presented23,26,37. We fitted individual parameters of the entrainment model by minimizing the sum of the squared error between the observed data from Experiment 2 and the model output to audio and visual sequences. This yields best fitting parameters23: period coupling q = 0.524, oscillation coupling η = 0.451 and the focusing parameter κ = 0.534. We also fit the magnitude of the prior-entry effect to the data, obtaining a value of 12.3 ms.

Bayesian symmetric model

Perception is obtained from the posterior distribution, i.e., the integration of the on-line sensory evidence (likelihood) with a priori knowledge of when a stimulus is expected to be sensed (prior). We propose that expectations are not static, but they are obtained by iteratively updating the probability of encountering a stimulus at each point in the future.

The likelihood probability distribution pl (t) is the probability of sensing a stimulus at time t given that the stimulus is produced in the environment. Gaussian distributions with 0 mean and variance σ2 are used to describe the noise in sensory latency for each modality. We determined the value of the parameters σ A and σ V (subscripts A and V denote audio and vision, respectively) that give most similar values of obtained PSS and JND, as described in Fig. 7. We obtained the posterior probability distributions pq (t) by multiplying the probabilities of the likelihood pl (t) and the prior pp (t)

Figure 7 Example of how Signal Detection Theory is used to compute model responses across trials. (A) Across trials, the posterior distributions of the audio and visual stimuli can be considered to directly compute the proportion of sound-first responses. To translate the single trial decision rule across trials, one needs to search for the unbiased response criterion that gives the highest d’ between the two curves. For each asynchrony, the probability of “sound first” responses, after having identified such optimal decision rule across trials, is calculated as the sum of the two areas below the visual posterior on the left of the criterion (Hits) and below the audio posterior on the right of the criterion (CR). (B) The probability values of “sound first” responses obtained from the model for different SOAs are analyzed using the Spearman-Kärber method for participants’ responses (see Methods). Full size image

We obtained the prior probability distribution pp (t) by using the posterior probability pq (t) for the previous stimulus (i.e., pq (t) for the time t-IOI). The added constant ω leads to a prior with heavy tails88 that allows sudden changes in IOI and then decreases the tendency of fully incorporating the posterior into a new prior (thus mitigating the increase in false alarms11). This is expressed by:

The parameter ω changes the predictions of the model as shown in Fig. 8A,B.

Figure 8 Predictions of the Bayesian models for Experiment 2 with different values of the added constant ω. Predictions obtained with lower values of ω are plotted with more saturated colors (ω = 0.032, 0.016, 0.008, 0.004, 0.002, 0.001). Higher values lead to flatter curves as the prior has less and less effect and the BET is smaller. (A) Bayesian model of perceived timing with symmetrical distributions and (B) with asymmetrical distributions. Full size image

To obtain the predictions for Experiment 2 we calculated the values of the posterior probability distributions for the last stimulus in the sequence, applying Equations 6 and 7 iteratively. Following previous empirical work89, we assumed that the brain does not only consider the onset of the stimulus to perform a TOJ. Although it is unclear what feature is considered for TOJs39,90, for computational simplicity we adopted the mean of the distribution (which is also in concert with recent work69). At each trial, the response is determined by the sign of the difference in timing between the means of the distributions to be compared39. A similar but computationally more tractable rule would be to calculate the difference in timing corresponding to an accumulated probability of 0.5 (i.e., the time corresponding to the median of the probability distribution). To calculate the proportion of responses across trials, we applied signal detection theory to the audio and visual posterior distributions over time91 (Fig. 7A). Several models of TOJ assume that differences in perceived relative timing are coded in the brain as the combination of presented asynchrony and latency difference in two channels39. The subsequent decision criterion is applied to this represented quantity. Here, we expand this approach by considering not only the representation of a single asynchrony value but of the whole probability distribution of asynchronies. The criterion then applies to a probability distribution and as such the decision is probabilistic leading to the proportion of responses as shown in Fig. 7B. From the proportion obtained at different asynchronies between audio and visual stimuli, we calculated the PSS using Equation 3. The value of the parameter ω influences the posterior and thus these proportions and then subsequently modulates the amount of regularization as shown in Fig. 8A,B. We determined the value of ω, σ A and σ V that best fit the PSS results of Experiment 2 shown in Fig. 2B. We obtained ω = 0.0038, σ A = 0.0142 and σ V = 0.0405. The best fit to the data is shown in Fig. 6C.

To derive the predictions for Experiment 1, we used the JNDs calculated from the interval-based model (Equation 5), to determine the standard deviations of the Gaussian curves of each sequence length. Before calculating the response probability distributions, we derived the temporal distortions for each anisochrony (horizontal-axis; Fig. 6A; left panels) given the Bayesian symmetric model generated in response to Experiment 2 (Fig. 6C; right). Thus, instead of representing the actual anisochronies, they represent the sensed stimulus timing.

Bayesian-asymmetric model

The likelihood probability distribution pl (t) is modeled as a monophasic impulse response function due to an exponential low-pass filter66 expressed by

The proportional sign is due to the normalization across the whole distribution, which makes the area under the curve equal to 1. The prior probability distribution and posterior probability distribution are obtained as described for the symmetric model (Equations 6 and 7). The predictions for the asymmetric Bayesian model are presented in Fig. 6D, where the parameter ω modulates the BET as shown in Fig. 8A,B. We fit the μ parameter for audio and visual stimuli and the added constant of Eq. 7 to the results of experiment 2: obtaining μ A = 75.0 ms, μ V = 87.0 ms and ω = 0.0009 (see Fig. 5A). The response distributions for Experiment 1 (Fig. 6D; left panels) were calculated in the same way as the symmetric model, however the temporal distortions applied were generated from the asymmetric model.

Model comparison