Participant screening

A total of 154 participants completed a questionnaire survey to assess the frequency of chills and tears for everyday listening. All participants were Japanese-speaking students and the complete study was conducted in Japanese. They were recruited from some regularly scheduled psychology classes in a university. Participants answered four questions as follows: ‘While listening to music, how frequently do you (1) get goose bumps, (2) feel shivers down your spine, (3) feel like weeping, and (4) get a lump in your throat’? The four items were rated on a Likert response scale that ranged from 0 (not at all) to 10 (nearly always). The scores of questions (1) and (2) were averaged as a chills score (Cronbach’s α = 0.74), while the scores of questions (3) and (4) were averaged as a tears score (Cronbach’s α = 0.75). Furthermore, participants provided the title of a musical piece that evoked a strong emotional response. We separately gathered chills- and tears-group participants because previous studies did not measure music-elicited tears in experimental settings. To enable participants to be fully engaged in the tear-evoking task and to increase the likelihood of obtaining tears responses, we conducted tears and chills experiments in several groups, using a between-participants design. In addition, both experiments required 90 minutes to complete; participants also did not take part in both conditions to avoid fatigue. Eleven participants were not asked to take part in the experiment due to their report that they have never experienced chills or tears during music listening. Most participants (83.2%) reported a Japanese pop/rock song as a chills or tears elicitor, while 24 participants reported classical or instrumental music (12.6%) or a UK/US song (4.2%). In order to maintain music characteristics, we only included the participants who reported that a Japanese pop/rock song evoked a strong emotional response. Moreover, 29 participants were not suitable to participate in the experiment because they could not select three pieces of music that had previously elicited a chills or tears response, and six participants were excluded due to difficulty with matching the control song (see the Stimuli section). An additional 18 participants were originally selected to be part of the chills group (10 participants) or tears group (eight participants), but were excluded from the analysis because they did not report chills or tears during the experiment. The remaining 66 participants were analysed.

Participants

Thirty-two undergraduates participated in the chills group experiment (14 males and 18 females; mean age 18.84 years, SD = 0.77) and thirty-four undergraduates participated in the tears group experiment (13 males and 21 females; mean age 18.79 years, SD = 1.07). For each group, we selected participants who reported at least one experience of chills or tears for music. We assigned the participants who showed a relatively high score for chill frequency to the chills group, and assigned the participants who showed a relatively high score for tear frequency to the tears group. In the chills group, the mean score of chill frequency was 4.33 (SD = 2.27) and tear frequency was 2.77 (SD = 2.02). In the tears group, the mean score of chill frequency was 2.66 (SD = 1.99) and tear frequency was 4.65 (SD = 1.91). Musical experience between the chills group (M = 7.91, SD = 7.05) and tears group (M = 6.50, SD = 5.04) was not significantly different (t(64) = 0.94, ns). We calculated the years of musical experience by summing up years of experience across all instruments and vocals (e.g. 3 years of experience with the piano and 2 years with the guitar = 5 years of total musical experience). Participants were required to be in good general health to participate in the study. We assessed that participants did not have a history of neurological, psychiatric, or cardiovascular disorders, or chronic medical conditions. All participants were instructed to abstain from coffee or alcoholic beverages the night before the physiological response measurement. Participants were compensated for their participation (about US$8 and course credit).

Stimuli

Participants were instructed to select their three favourite songs before the experiment. Each of the 32 or 34 participants selected three songs that had previously elicited a chills or tears response. These 96 songs in the chills group and 102 songs in the tears group were kept separate. Then, 198 songs were used as stimuli in the present experiment (all songs are listed in Table S1, and the musical features extracted by music information retrieval method in Table S2). After collecting all of the song information before the experiment, the experimenter prepared digitalised files (WAV file format) from the original CD recordings. To maintain ecological validity, the song stimuli were the full-length version of the songs including the lyrics. The mean song duration was 295.7 s in the chills group and 304.4 s in the tears group. Because one’s favourite song shows a high probability to evoke a strong emotional response in the laboratory18,56, we used such songs as the experimental stimuli.

Each participant listened to six songs, including the test and control song. The test songs were three self-selected songs, whereas the control song was selected for each individual by applying methods in which one participant’s peak emotional song was used as another participant’s control song14,18,57. The experimenter matched the control song for each participant within the chills group or tears group. In this matching procedure, we avoided a given participant’s six favourite artists (the three artists who performed the self-selected song and their three favourite artists, which was ascertained prior to the experiment) in order to suppress a potential peak emotional response evoked by the control song. For example, if song set A (three songs) evoked peak emotions in participant 1, and song set B (three songs) evoked peak emotions in participant 2, then song set B served as the psychoacoustic control for participant 1, and song set A served as the psychoacoustic control for participant 2. In the experiment, each of the 96 songs in the chills group and 102 songs in the tears group was used once as a self-selected song and again as an experimenter-selected song. By using this matching procedure, we collected 33 pairs of data from 66 participants (16 pairs of data from 32 participants in the chills group and 17 pairs of data from 34 participants in the tears group). The advantages of this matching method were that it allowed us to analyse data by comparing the same sets of stimuli and to exclude the possibility that found effects are solely due to the acoustic changes of the song (e.g. sudden increase in tempo or elevations in pitch).

Recordings of physiological signals

Three physiological signals were recorded: an electrocardiogram (ECG), respiration, and EDA. The ECG was recorded using three disposable Ag/AgCl 38 mm diameter spot electrodes positioned in a three-lead chest configuration (right collarbone, lowest left rib, and lowest right rib). The RR and RD were measured with a piezoelectric transducer belt placed around the chest. The EDA was measured on the palmar surface of the middle phalanges of the first and second fingers of the left hand (DC, time constant 4 s). A constant-voltage device maintained 0.5 V between 19 mm Ag/AgCl electrodes filled with sodium chloride gel. The ECG and respiration were acquired with a Polygraph 360 (NEC Medical Systems, Japan) and EDA was obtained with a DA3-b sensor (Vega Systems, Japan). The recordings of the three physiological signals were sampled at 1 kHz using a Vital Recorder Monitoring System (Kissei Comtec, Japan).

Procedure

The listening experiment was conducted inside a sound-attenuated chamber and stimuli were presented through two loudspeakers (DS-200ZA; Diatone, Japan) positioned in front of the participants. The loudspeakers were connected to a computer outside of the chamber via an amplifier (PM-7SA; Marantz, Japan). Experimental tasks were controlled with the customised Visual C++ 2008 programme (Microsoft, USA). The synchronisation of experimental tasks and physiological measures was also controlled with the programme. The temperature in the room was kept at 24 ± 1 °C. The experiment was conducted with one participant at a time. After obtaining informed consent, the participant was seated in front of a computer screen and attached to the physiological sensors, such as the electrodes and the respiration belt. Then, they were asked to sit still, not to speak, and to relax with their eyes open during a 7-min laboratory adaptation period. Following this, a 3-min pre-experiment baseline of physiological activities was measured. All of the participants took part in a training session to become familiar with the experiment before the main trial.

In the main trial, each participant listened to six songs in a pseudo-randomised order across the test pieces and control pieces. In order to avoid three self-selected or control pieces presented continuously, we used a pseudo-randomised method with at least one self-selected (control) song presented between the three pieces of control (self-selected) song. The song stimuli were presented at each participant’s comfortable listening level because peak emotional responses are readily elicited by this listening state14. In order to keep track of their peak emotional responses during song listening, participants were asked to complete button-press measures. In the chills group, this involved pressing the left button of a computer mouse whenever they felt chills (instructed as ‘goose bumps’ or ‘shivers down the spine’). In the tears group, this involved pressing the left button of a mouse whenever they felt tears (instructed as ‘weeping’ or ‘a lump in the throat’). Importantly, the only difference in procedure between the chills and tears groups was the psychological index of the mouse-button click, which enabled the comparison of the physiological effect of chills and tears without experimental artefacts. Participants were also asked to rate their subjective valence level in real time via movements of the mouse exhibited on the computer display at 480 pixel scale. Moving the mouse to the right indicated heightened inner pleasure, whereas moving the mouse to the left indicated heightened inner displeasure. If participants did not feel pleasure or displeasure, they moved the mouse to the centre. The mouse-button signals and valence ratings were recorded with a 10-Hz sampling rate. After listening to each song, participants rated how intensely they felt chills or tears in the overall song. Both responses were given on a 9-point Likert-type scale ranging from 0 (not at all) to 9 (very strong). Participants also rated the felt emotional responses to each song in terms of valence (−4 = displeasure, 0 = neutral, 4 = pleasure) and arousal (−4 = deactivation, 0 = neutral, 4 = activation). Furthermore, they rated the song in terms of happiness, sadness, calm, and fear on a scale ranging from 0 (not at all) to 9 (very strong). We selected these four emotions because they encompass the four quadrants of a two dimensional emotional space (happy: high-arousal, high-valence; calm: low-arousal, high-valence; fear: high-arousal, low-valence; sad: low-arousal, low-valence). Previous music and emotion studies showed that music can simultaneously evoke happy and sad emotions (i.e. mixed emotions) but not pleasure and displeasure44,45; therefore, the happy-sad dimension may not be the equivalent of the pleasure-displeasure dimension in musical emotions58,59. We measured happy, sad, calm and fear perceived emotions in order to capture the quality of musical emotion more precisely than only two dimensional models, and we expected to receive some mixed emotional responses (i.e. happy-sad, happy-fear, and calm-sad). After this, the participant was asked to relax for 1 min as a rest period. We set the latter 30 s of this period as the physiological baseline for the subsequent song stimuli. The experimental procedure was approved by the Faculty of Integrated Arts and Sciences Ethics Committee of Hiroshima University. We confirmed that all experimental methods were performed in accordance with relevant guidelines and regulations.

Quantification of the time series data

For the emotional evaluation, the onset of peak emotional response was defined as the start time when participants pressed the mouse button, which was the chills response in the chills group and the tears response in the tears group. The number and duration of button presses were counted for each song piece, and these were used as measures of the number and duration of peak emotional experiences. Because we considered button presses that occurred less than 1 s after a previous press to not be peak emotional responses, we removed these from the analysis. The first 15 s of peak emotional responses were also not included in the analysis to avoid a physiological initial orientation response60,61. Furthermore, the values of the real-time valence ratings were divided by 60 in order to set the same range of valence ratings after listening.

For the physiological signals, HR (beats per minute) was determined by a programme that detects R spikes in the ECG (Peakdet, Version 3.4.30562; for use in MATLAB, MathWorks, USA) and calculates interbeat intervals (IBIs). Beat-to-beat values were edited to exclude outliers due to artefacts or ectopic myocardial activity. Outlier IBI values were identified by flagging intervals that were larger than 1,500 ms or 150% of the mean value of the preceding 10 intervals, or smaller than 500 ms or 50% of the mean value of the preceding 10 intervals (an outlier of approximately 0.1%). Linear interpolated R spikes were inserted when the IBI was too long, whereas R spikes were deleted when the IBI was too short. Then, to obtain the time series data, the cubic spline interpolation of the non-equidistant waveform of the IBI sequence was completed, and IBIs were resampled at 10 Hz in order to synchronise the time series with the subjective responses. A similar computer-assisted procedure for detecting respiration peaks was applied to the respiration signal. The moments of maximal inspiration and minimal expiration were used to determine RR (in cycles per minute) and RD (calculated by the amplitude from base to peak of a single ventilator cycle63). The detected peaks were edited for outliers (respiratory cycles <1,000 ms). Both respiratory parameters were interpolated with a cubic-spline function and resampled at 10 Hz, similar to the HR. Furthermore, SCL and SCR from the electrodermal data were downsampled to 10 Hz. SCL reflects tonic sympathetic activity and has a relatively long latency and changes slowly. In contrast, SCR reflects phasic sympathetic activity, emerges within 1–3 s, and changes quickly64. They were expressed in microSiemens (μS). To account for the typically skewed distribution of electrodermal response measures, the SCL and SCR values were each transformed into log and log + 164.

Analysis strategy

We first conducted a manipulation check. To confirm the validity of the experimental condition, the degree of peak emotion evoked for the self-selected song and experimenter-selected song was compared. To test for the same level of elicitation of peak emotion, we compared the chills measure in the chills group and tears measure in the tears group. Furthermore, we examined whether the pre-experiment baseline of physiological activities between groups was the same or not.

For the main analysis, we tested overall responses and the 30 s around the peak onset of responses for the psychophysiological measures. Overall responses scores were derived for each rest and song period. Difference scores were calculated by subtracting the average from the 30 s immediately preceding the rest period from each of the 10 Hz physiological data of the song period. The difference scores were then averaged within each song period. In addition, the overall valence score was calculated as the mean of the real-time valence rating during each song period. For the 30 s around the peak onset of responses, we used the z-score derived from each song period. The z-scores were calculated of the 10 Hz difference scores of physiological responses and real-time valence ratings. The z-score represents the relative change of overall mean response to the stimulus. The positive value of a z-score reflects an increase from the mean score of the overall response, whereas the negative value of a z-score reflects a decrease from the mean score of the overall response. To show whether the chills and tears effects stand out during song listening, we tested the psychophysiological responses to around the chills or tears period using the z-score. The physiological responses and real-time valence ratings for self-selected songs were included in an ‘around peak onset response window’ ranging from 15 s before to 15 s after the chills or tears onset per song stimulus. We set this time window because some previous studies have shown that psychophysiological responses during this time window are sufficient to examine the chills experience6,10,14.

We created matching trials in order to control for the effect of acoustic changes on the psychophysiological responses. That is, the trials during which participants experienced chills or tears for self-selected songs were matched with the trials of the same song that were heard by a different participant as an experimenter-selected song. Importantly, this matching procedure was conducted within the experimental group for chills or tears. The overall responses for both the self-selected song and experimenter-selected song were averaged for the number of times based on the chills- or tears-induced trial for the self-selected song. The 30 s around peak onset windows for both self-selected songs and experimenter-selected songs were first averaged for the number of peak emotion times for each chills- or tears-induced trial for self-selected songs. Subsequently, these data were further averaged in the same way as the overall responses. Therefore, the same number of psychophysiological responses for a song was collected for each self-selected song and experimenter-selected song.

Statistical analysis

All statistical analyses were performed with the R 3.3.1 software65. As a manipulation check, the effect of song condition on peak emotional responses was investigated using a between-subjects t-test. The dependent variables were the number, intensity, and duration of chills in the chills group and of tears in the tears group. Next, between-subjects t-tests were calculated to test whether the above chills-related measures in the chills group differed from the tears-related measures in the tears group. In addition, we compared the chills group with the tears group for the pre-experiment baseline of physiological activities using between-subjects t-tests.

The overall psychophysiological responses were analysed using ANOVA. A 2 (self-selected song vs. experimenter-selected song) × 2 (chills group vs. tears group) between-subjects ANOVA was conducted using the dependent variables of real-time valence and each physiological measure. In addition, we used a one-sample t-test to assess whether these parameters deviated from zero separately for each of the four ANOVA levels. Furthermore, in order to analyse the time series psychophysiological data, we used functional data analysis techniques34. The functional data analysis techniques were specifically developed for analysing temporal data66. In these techniques, mathematical functions are first fitted to the discrete 10 Hz data. Next, a statistical analysis was performed on the continuous function. Fourth-order (cubic) B-splines were fitted to each individual’s raw psychophysiological responses with 30 basis functions to the data samples. The data were smoothed with a constant smoothing parameter (λ = 0.1). This smoothing value was chosen so as not to eliminate the contours of each variable that were important to the analysis. One-way fANOVA were employed with condition (self-selected vs. experimenter-selected) as the between-subjects variable. A fANOVA was conducted on the real-time valence and each physiological response within the chills or tears group comparison. An advantage of this technique compared to traditional ANOVA is that it allows us to discover ‘when’ the psychophysiological responses were significantly different. Functional p-value curves were calculated for each response using a functional permutation F-test with 1000 random permutations34.

In addition, a two-way ANOVA, which was the same for the overall responses, was conducted using the dependent variables of the subjective emotional rating of valence, arousal, and the perceived song expression rating of happiness, sadness, calm, and fear that were obtained after listening. Again, we used a one-sample t-test to assess whether valence and arousal deviated from zero separately for the four ANOVA levels. Correlations between all subjective emotional responses were also performed for chills and tears groups.

Data handling

We treated peak emotional responses for self-selected songs that exceeded 2.5 standard deviations from the mean as outliers because there were too many of these responses and they were not reliable as peak emotional responses. We excluded trials in which the experimenter-selected song induced more chills or tears than the self-selected song because such trials did not fit our analysis criteria. In addition, programme error forced us to exclude two participants in the chills group. Finally, peak emotional responses were collected and analysed for 67 songs in the chills group and 66 songs in the tears group.