In this study we set α = 0.05 and all statistical tests were two-tailed.

Comparison of male and female ratings

The split-half reliabilities for the mean dance quality rating per image were r = 0.60, p < 0.0001 for both male and female raters. The mean male and female ratings per image were significantly correlated (r = 0.77, p < 0.0001). Together, these results suggest that both sexes were in good agreement about the rank ordering of dance quality.

Rhythmicity of dance

Dance is characterised by oscillatory movement that is linked to the beat of the music. These beat-linked movements should be apparent within frequency-domain descriptions of the time-domain recordings as peaks at discrete frequency bands. We predicted that the movements of better dancers would be tied more closely to the beat of the music, and thus would give rise to peaks of greater magnitude. To test this, we compared the frequency-domain descriptions of the five best and five worst rated dancers. We focussed on the recordings of the elbow, hip and spine joint angles, because these were identified as the salient biomechanical variables that discriminated between the two groups of dancers (see Methods). We used Origin Pro 2016 to compute fast Fourier transforms (FFTs) from the time series of these joint angles. To standardize this comparison, a fixed window of 14.5 sec was selected from each of the 39 dance sequences. A low pass Hanning filter was applied to the time series with the upper frequency set at the Nyquist limit for these data of 100 Hz. Figure 1 plots the average FFT amplitude for the elbow (a), hip (b) and spine (c) as a function of oscillatory frequency, separately for the highest-rated and lowest-rated groups of dancers.

Figure 1 Fast Fourier transform graphs to show the averaged amplitude of joint oscillation as a function of frequency for (a) the elbows, (b) the hips and (c) the spine. The 5 highest-rated dancers are represented by the pink line and the 5 lowest-rated dancers are represented by the blue line. The pink and blue shaded regions in each graph represent the mean amplitudes at each frequency +/−1 standard error of the mean. White background regions indicate that the vertical separation (i.e. the difference in oscillatory amplitude at that frequency) between the two groups of dancers is statistically significant at p < 0.05 corrected for multiple comparisons. Full size image

The dance music heard by the dancers was played at a rate of 125 beats per minute, which corresponds to a frequency of 2.08 Hz. To illustrate, if one imagines 4 drum beats to the bar at 125 bmp, the hip swing from the left to the right and back again would correspond to a frequency of ~1 Hz. Figure 1 demonstrates that the highest-rated dancers showed significantly higher peaks of oscillatory activity, particularly for the spine and hips, at around a frequency of ~1 Hz and less so at ~0.5 Hz and 0.25 Hz. This is what we would expect if the movements of the highest rated dancers were better synchronised to the beat of the music than those of the lowest rated dancers.

Using focal movement parameters to explain differences in rated dance attractiveness

We used PROC MIXED (SAS v9.4) to fit five separate multi-level models to explain dance quality on the basis of five movement parameters: asymmetric arm movements, (i.e. the right arm moving independently of the left), asymmetric thigh movements (i.e. the right upper leg moving independently of the left), hip swing, amount of arm movement, and amount of thigh movement (see Methods for more information on quantification of these parameters). Multilevel models have the advantage of modelling variability in both raters and stimuli (dancers) simultaneously as fully crossed random effects. Traditional analyses that model variability in stimuli or raters alone are known to inflate Type I error (see e.g. refs 20 and 21).

As explanatory variables in each model, we initially included: (i) trial order, (ii) the sex of the rater (Sex), (iii) the movement parameter (standard deviation: SD), (iv) second order polynomial terms for the movement parameters (SD2) where appropriate, (v) the interaction between sex and movement parameters (Sex × SD). The distributions for hip swing and thigh movement failed the Shapiro-Wilks test of normality (W = 0.84, p < 0.001 and W = 0.92, p = 0.008 respectively), and therefore these data were logarithmically transformed for all analyses. In addition, based on significant reductions in -2log likelihood, we modelled intercept variation for both raters and dancers by specifying an ‘unstructured’ variance-covariance structure for each in the model’s G-matrix. For all five models initially, there were no statistically significant effects of trial order, and so this variable was excluded from the final analysis. Detailed model outputs are reported in Table 1. The dummy coding in the model for sex used males as the reference.

Table 1 Output from five independent multi-level models testing the influence of dance metrics on dance quality. Full size table

Both male and female raters judged that more attractive dances contained greater arm movement and hip swing, and more asymmetric thigh movements (Table 1 and Fig. 2). Higher quality dances also contained intermediate quantities of two of our focal parameters: thigh movement, and asymmetric arm movements (see polynomial relationships in Table 1 and Fig. 2). That is, dances that contained some thigh movement, and some asymmetric arm movements, were rated more attractive than dances that contained high or low quantities of these movements. In addition, we found a significant main effect of rater sex, together with a significant interaction between rater sex and arm movement, in relation to dance quality ratings. This combination of effects indicates that the influence of arm movement on dance quality ratings was stronger for female than male raters for low values, equivalent at intermediate values, and stronger for male than female raters at high values.

Figure 2: Plots of the LSmean dance quality, estimated from each of the five final models, plotted as a function of five focal movement parameters. Orange and cyan dots show LSmean estimates of dance quality for each female and male rater, respectively. The red and blue lines represent the regression lines (linear or polynomial) for female and male raters, respectively. Full size image

Contribution of focal movement parameters to dance quality ratings

The foregoing analyses characterize the relationship between dance quality ratings and the five dance movement metrics, when computed independently from separate models. A critical question, however, is which of these metrics optimally explain dance quality ratings when competing against each other simultaneously in the same analysis. We therefore used PROC MIXED (SAS v9.4) to fit a final mixed model which initially contained rater gender together with all five dance movement metrics as explanatory variables. We optimized the final model by finding a solution which: (i) minimized −2 log-likelihood, and (ii) only retained explanatory variables that were statistically significant at p < =0.05. In order to allow meaningful comparison of the magnitudes of model regression weights, we also centred all continuous explanatory variables by converting them to z-scores. The model outcome is reported in Table 2, which shows that linear terms for both asymmetric thigh movements and hip swing, as well as polynomial terms for asymmetric arm movements, were together sufficient to explain dance quality ratings according to our optimization criteria. That is, dances were rated more highly if they contained more hip swing, more asymmetric thigh movements, and moderately asymmetric arm movements. This model is illustrated in Fig. 3.

Table 2 Output from optimized multi-level model, initially including all five dance movement metrics and gender of the rater as explanatory variables. Full size table

Figure 3 (a) 3D surface plots to show the effects of hip swing (x-axis) and asymmetric arm movements (y-axis) on dance quality ratings (z-axis), at a z-score of +1 SD for asymmetric thigh movement (top surface plot) and a z-score of −1 SD for asymmetric thigh movement (bottom surface plot). (b) shows cross-sections through these surfaces to illustrate how three dance quality ratings (2.9 in blue, 3.1 in green and 3.3 in red) can be achieved via different combinations of hip swing and asymmetric arm movements, at +1 SD of asymmetric thigh movement (solid lines) and at −1 SD of asymmetric thigh movement (dashed lines). Full size image

Figure 3 illustrates how the same level of dance quality is predicted by different combinations of the three movement metrics. For example, the middle cross-section in Fig. 3b shows that when asymmetric thigh movement is +1, a dance quality rating of 3.1 can be achieved at the highest hip swing values together with the lowest asymmetric arm movement (i.e. the solid green curve). At mid-range values for asymmetric arm movement, less hip swing is required to achieve the same result. However, at the highest values for asymmetric arm movement, there is a need for greater hip swing again. When asymmetric thigh movement is set to the lower level of −1, the regime to achieve a dance quality rating of 3.1 undergoes a rightward shift such that greater hip swing is required for all values of asymmetric arm movement (i.e. the dashed green curve).