Participants

The participants were members of the Gryphon Trio, an internationally acclaimed Canadian professional music ensemble, which includes one pianist (M, age = 53 years), one violinist (F, age = 49 years), and one cellist (M, age = 50 years).

Eleven additional internationally acclaimed professional musicians (two pianists, four violinists, two violists, and three cellists; three men and eight woman; mean age = 43.4 years, range = 34–58 years) were recruited as judges.

All trio performers and musician judges had normal hearing and were neurologically healthy by self-report. Informed consent was obtained from each participant, and they received reimbursement. All procedures were approved by the McMaster University Research Ethics Board, and all methods were performed in accordance with the approved guidelines and regulations.

Stimuli and Apparatus

The data were collected in the McMaster University Large Interactive Virtual Environment Laboratory (LIVELab; livelab.mcmaster.ca). The trio performed six happy and six sad excerpts (Table 1). The authors and trio performers chose the excerpts together from the trio’s current repertoire based on the criteria that the excerpts had (1) high emotional expressivity, (2) clear happy or sad emotion, and (3) balanced roles among music parts (i.e., each part was approximately equally prominent, rather than a prominent distinction between the melody and accompanying parts). We selected pieces from Classical (Beethoven), Romantic (Dvořák), and Tango (Piazzolla) styles so our findings could be generalized to a broad range of Western music styles. In the Happy condition, performers only played pieces that were determined a priori by the performers and experimenters as communicating happiness; likewise, in the Sad condition, pieces were determined a priori as communicating sadness. We did not control the acoustic characteristics (e.g., tempo, number of notes) between the happy and sad excerpts, as we aimed to keep the performances as naturalistic as possible. However, it should be noted that the same pieces were played in the expressive and non-expressive conditions, so this would not affect the main comparison between these conditions.

Table 1 Trial order and experimental conditions. Full size table

A passive optical motion capture system (24 Oqus 5 + cameras and an Oqus 210c video camera; Qualisys) recorded the head movements of participants at 120 Hz. Each participant wore a cap with four retroreflective markers (3 mm) placed on the frontal-midline, centre-midline, and above the left and right ears. Three positional markers were placed on the ground to calibrate the anterior-posterior and left-right axes of each performer’s body. Additional markers placed on the arms and instruments were not analyzed in the current study. The performers confirmed that these placements did not constrain their body movements and that they were able to perform as usual.

The music performances were audio recorded using two DPA 4098-DL-G-B01–015 microphones suspended above the trio, digitized at 48 kHz/24 bit using Reaper recording software (Cockos, Inc.).

Design and Procedure

A factorial design was used, with Emotion (Happy, Sad) and Expressivity (Expressive, Non-expressive) as factors. In the Expressive condition, performers were requested to play the excerpts emotionally expressively, as they would in a typical music performance. In contrast, in the Non-expressive condition, performers were requested to play the excerpts without emotional expression (deadpan or mechanical performance). In both conditions, performers were asked to play the excerpts as best as they could under the given condition, and the performers were aware that their performances would be recorded and rated. Within each trial, an excerpt was played for a total of three minutes. To make every trial three minutes long, if the performance of an excerpt was shorter than three minutes, the performers looped their performance from the beginning until the three-minute mark was reached. This was necessary to collect enough data points for the time series analyses.

The complete design is shown in Table 1. Each excerpt was performed twice in consecutive trials, once in the Expressive condition and once in the Non-expressive condition. All the conditions were counterbalanced. There were no practice trials, but the performers were already familiar with the pieces. The entire experiment, including preparation, took approximately four hours and was completed on the same day.

Once a three-minute trial ended, each performer independently rated five aspects of the group’s performance using a 9-point Likert scale (Low: 1 to High: 9). (1) Goodness (“How good was it in general?”), (2) Emotion-expression (“How well was the emotion expressed?”), (3) Emotion-valence (“How sad-happy was the emotion expressed?”), (4) Emotion-intensity (“How intense-calm was the emotion expressed?”), and (5) Synchrony (“How technically synchronized was it?”). Because the ensemble was comprised of high-level, professional musicians who had performed together for many years, we expected that they would be sensitive judges of these variables.

Additional professional musician judges independently rated each of the trio’s performances using the same questionnaire. These judges conducted their ratings solely based on the audio recordings at home at their convenience. The purpose of the study and the identities of the trio performers were not revealed to the raters.

Motion capture data processing

The motion capture data processing was similar to our previous study36. Motion trajectories were exported from Qualisys Track Manager for processing and analysis in MATLAB. The first 180 s of each excerpt were analyzed. Missing data due to recording noise were found in only 15 of 864 trajectories and for durations shorter than 6 ms. These durations were gap filled with spline interpolation. Each trajectory was down-sampled to 8 Hz by spatially averaging the samples within each nonoverlapped 125-ms window. This was done because Granger causality analysis prefers a low model order for capturing a given physical time length of the movement trajectory56. Visual inspection confirmed that this rate was sufficient for capturing most head movements. No filtering or temporal smoothing was applied to the data because temporal convolution distorts the estimation of Granger causality56. To estimate the anterior–posterior body sway, we spatially averaged the positions of the four motion capture markers on the head of each performer in the x–y plane (collapsing altitude) for each time frame, and the anterior–posterior orientation was referenced to the surrounding markers placed on the ground. Finally, each time series was normalized (z-scaled) to equalize the magnitude of the sway motion among performers. This procedure produced three normalized body sway time series, one for each performer for each trial.

Granger causality of body sway

The MATLAB Multivariate Granger Causality (MVGC) Toolbox56 was used to estimate the magnitude of Granger causality between each pair of body sway time series among all three performers in each quartet. First, the MVGC toolbox confirmed that each time series passed the stationary assumption for Granger causality analysis, with the spectral radius less than 1. Second, the optimal model order (the length of history included) was determined by the Akaike information criterion on each trial. The optimal model order is a balance between maximizing goodness of fit and minimizing the number of coefficients (length of the time series) being estimated. The model order used was 14 (1.75 s) because this was the largest optimal model order across trials within the trio. Model order was fixed (i.e., did not vary by trial), which avoided model order affecting Granger causalities differently on different trials, and the largest model order across trials covered all optimal model orders across trials. In this way, six unique Granger causalities were obtained for each trial, corresponding to the degree to which each of pianist, violinist, and cellist predicted each of the other two performers. It is important to note that we estimated each Granger causality between two time series conditional on the remaining one time series because, in this way, any potential common influence on other variables was partialed out56. We further averaged these six unique Granger causalities for each trial as causal density (CD), which represents the total amount of information flow within the ensemble57. We did not analyze each Granger causality separately because we were interested in how the total directional information flow within the ensemble was influenced by the independent variables Emotion and Expressivity.

Cross-correlation of body sway

Cross-correlation quantifies the similarity between two time series as a function of a shifting time step. To empirically compare Granger causality and cross-correlation, we performed cross-correlation analyses on the same preprocessed data to which we had applied Granger causality, and the cross-correlation coefficients were calculated for the window up to plus or minus the model order that was used for the Granger causality. Although the window size was optimized for Granger causality, it would not suboptimize the cross-correlation analyses, as the window size (1.75 s) was actually wider than that used in most of the cross-correlation analyses on music performers’ body sway25,47, which has typically ranged up to ± one beat. Within the window we picked, the maximum unsigned cross-correlation coefficient (highest similarity) for each of the three pairs of musicians for each trial, and then averaged the coefficients across all pairs within each trial.

Statistical analyses

We performed mixed-design ANOVAs separately on CD and cross-correlation coefficients values to analyze the modulation of body sway coupling by Emotion (Happy, Sad) and Expressivity (Expressive and Non-expressive). The significance of the effects was determined with type-II Wald tests using the “Anova” function in the “car” package in R58.

We considered Emotion of the music excerpts (Happy, Sad) as a random-effect and Expressivity as a fixed-effect. Traditional approaches would treat Emotion as a fixed-effect. However, as happy and sad are characteristics of the stimuli, and we are using a small sample of all possible happy and sad stimuli, ignoring the sampling variance of these few samples could potentially affect the generalizability of the reported effect to the entire population of happy and sad stimuli. Therefore, it has been proposed that it is better to treat stimulus characteristics as random effects59,60.

To investigate whether CD and cross-correlation coefficients reflected expressive aspects of the performances, we performed Spearman rank correlation analyses between the CD and cross-correlation coefficients separately with the subjective ratings of the performances both by the trio performers and by the judges.

Every statistical test was performed two-tailed. We set α = 0.05, and Bonferroni-adjusted α was used for each post hoc comparison series as a conservative control for type I error.