The participants signed a consent declaration in which they declare that they are freely participating in the experiment, that they are informed in advance about the task, the procedure and the technology used for measurement. They had the opportunity to ask questions and agreed that recordings of their actions were made. They agreed that recorded data would be used for scientific and educational purposes only. In agreement with the general standards at our university and our faculty, security was guaranteed (our indoor task is not dangerous), and privacy is respected. According to the Belgian law for experiments aiming towards research performed in view of the development of biological or medical knowledge (cf. 7 May 2004 Law concerning experiments on the human person (Ch.II, Art.2, Par.11)), our research is exempt from needing ethical approval as this study only involves behavioral knowledge.

The walking experiment was based on 52 musical excerpts and 6 identical metronome sequences. All excerpts and sequences had a duration of 30 seconds and a tempo of 130 BPM (64 beats). Amplitudes were normalized and subjectively checked to minimize the differences in loudness and a short fade-in of 50 ms and a fade-out of 100 ms was applied to each musical excerpt, using CoolEdit. The metronome sequences were generated with Analog Box ( http://code.google.com/p/analog-box/ ). In order to maximize musical diversity, a group of three musicologists collected a set of musical pieces with a tempo of 130 BPM from a variety of different styles. From this set the selection of 52 excerpts was made based on a series of criteria: the three experts should agree that the tempo was 130 BPM, the tempo should be stable throughout the excerpt and the music should have a homogeneous character. Moreover, in the selection process, a preference was given to musical pieces that would probably be unknown to most participants, in order to avoid effects of familiarity as much as possible. Table 1 provides a list of all the musical excerpts that were used. Based on the 52 excerpts and 6 metronome sequences, three playlists (I, II, III) of 58 stimuli (musical excerpts and metronome sequences) were generated by randomly changing the order in which the musical excerpts were presented. The 6 metronome sequences were presented at fixed positions in each playlist, namely, at positions 1, 12, 23, 34, 45, and 58. Between each stimulus, a 5 second break was inserted. Each participant listened to one of the three playlists and all playlists occurred an equal number of times during the experiment. For the second part of the experiment only the 52 musical excerpts were used.

One or two days after the walking experiment, the participants came to the laboratory to complete the rating experiment. They listened to the same music and rated the excerpts, using nine pairs of bipolar adjectives based on the following [15] : good-bad, happy-sad, tender-aggressive, soft-loud, slow-fast, moving-static, stuttering-flowing, easy-difficult (to synchronize with) and known-unknown. The interpretation of the bipolar adjectives was explained to the participants by means of a short text, before starting the experiment. The adjectives were presented on sheets (one sheet for each musical piece), and were divided by a 10 cm horizontal line. The line was used as a Likert scale, allowing the participant to make a quasi-continuous judgment by putting a mark somewhere along the line. This rating will allow us to clarify the personal motivation and provide a first level of explanation of how the effect of sonic parameters is interpreted by the listener. After finishing the whole experiment, the participants received a voucher of 15 Euro, which they could spend at a well-known multi-media market.

The walking experiment took place in a sports hall. In the middle of the hall, a circle with a diameter of 15 m was drawn, which served as the pathway for the walking participants. For a video example and sounds, see ( http://www.ipem.ugent.be/ActivatingRelaxingMusic ). Upon arrival, the participants were briefly informed about the procedure and the goal of the research. Next, they were equipped with a wireless sensor system, called Xbus Kit ( http://xsens.com/en/products/human__motion/xbus_kit.php ). This kit consists of five MTx sensors that measure acceleration, angular acceleration and the earth magnetic field, each in 3D. These MTx sensors were attached to the top of the right foot, the side of the right ankle, the right knee, the right hip and the right hand, which gives us a detailed image of the participants movements while walking. The MTx sensors were connected to the Xbus Master, which collects the data and sends them to a computer (laptop ACER Aspire 1500) via a Bluetooth connection with a sampling frequency of 50 Hz. Next to the sensor system, each participant received an IPod Nano and a pair of headphones (Sennheiser HD 215). Before starting the experiment, the participants were explicitly instructed to walk in synchrony with the music, sticking to the tempo of the metronome stimulus that they could hear at the beginning of the soundtrack. The walking path was indicated by a circle on the ground. Participants were instructed to walk along the circle whenever they heard a sound through their headphones and to stop when the music stopped. Before they started they heard some music which they could use to adjust the volume. Everybody was asked to choose a comfortable listening level and once this was fixed the volume could not be changed anymore. The excerpts were presented in two blocks of 29 pieces (taking 16 minutes 55 seconds to finish one block). Between blocks, a five-minute break was given, during which some refreshments were offered.

Feature Extraction and Data-analysis

Walking speed. The speed of walking was derived from the outputs of the sensor that was attached to the hip. This sensor information provided the angle with respect to the magnetic north pole. Participants had to walk in a circle, so that for each stimulus the distance between starting position and end position could be calculated using formula 1, (1) stands for the distance, is the angle (in radians) of the end position with respect to the magnetic north pole, is the angle (in radians) of the starting position with respect to the magnetic north pole, and is the radius (here 7.5 m). The speed of walking was calculated by dividing the distance by the duration of the stimulus, which is 30 seconds.

Walking tempo. The walking tempo was calculated using acceleration information of the sensor attached to the foot. The tempo was found by taking the peak of a Discrete Fourier Transform (DFT) that was applied to the acceleration data. The size of the DFT was chosen in such a way that the resolution of the DFT bins was equal to 0.5 BPM. As the tempo of only one foot was measured, it was necessary to multiply the obtained DFT bins by 2, so that the value corresponds with half the walking tempo of one foot.

Normalizing and averaging the walking speed. In order to assess the effect of the sonic features on walking speed, each song is assigned a unique walking speed . The assignment process is solely based on acceptable trials, defined as trials in which the participant walks in synchrony with the tempo of the stimulus (either a song or a metronome tick). Since the mean walking speed of a participant is bound to depend on his/her physical characteristics, such as length and weight, one needs to assure that the computed is not affected by these characteristics. Therefore, in a first step, the speed values of the acceptable song trials of participant are divided by the mean speed of that participant over the acceptable metronome trials. The underlying assumption is that metronome sequences are neutral in terms of activation and relaxation. Once this normalization is performed, the envisaged for a song is the mean normalized walking speed over the acceptable trials of all participants for this song. Using this procedure, the walking speed to metronome ticks equals 100 units. Slower and faster walking speeds for songs are rated below or above this value.

Extraction of sonic features. It is anticipated that walking speed in synchronous walking is especially affected by the temporal patterns in the music. Therefore, for each of the 52 musical excerpts, a set of 190 sonic features is computed: 2 features are provided by a beat tracker and 188 features emerge from a dedicated feature extractor encompassing three levels of analysis [16], called the frame-level, the beat-level, and the song-level. Generally speaking, the sonic feature extraction is achieved in three stages. The audio signal is first converted into a stream of acoustic parameter vectors (one vector every 5 or 10 ms; the components of the vector cover subsequent frequency bands). This feature stream is then analyzed per inter-beat interval (IBI) and gives rise to beat-level feature vectors (one vector per beat). In the third stage the time evolution of that beat-level feature in the course of the song, called the feature pattern, is considered as a ‘signal’ and its spectrum is computed at four frequencies, namely 1/2, 1/3, 1/4 and 1/6 of the beat rate. In total, 45 beat-level features were considered, giving rise to 4×47 = 188 sonic features. Each acoustic parameter vector consists of (a) 6 loudnesses (loudness = energy to the power of 0.25) evoked by the outputs of a 6-channel filter bank and (b) 52 evidences for 52 frequencies between 0.1 and 2 kHz and coinciding with the notes on a Western scale. A more technical description is given in the next paragraphs; for more details, see [16].

Frame-level analysis. The frame-level analysis consists of two components. The first component considers subsequent fixed length frames of 30 ms long, shifted over 5 ms. Per frame (a time interval of 5 ms), this analysis produces the loudnesses measured in six frequency bands. This is achieved by decomposing the signal into six subband signals by means of six triangular filters with center frequencies of approximately 118, 298, 570, 983, 1609 and 2559 Hz and by measuring the energies of these signals in an interval of 30 ms. The second component considers subsequent frames of 150 ms long, shifted over 20 ms. This analysis produces frame per frame evidences for 52 note-related frequencies ranging from 0.1 to 2 kHz. We consider the note frequencies on an equally tempered western scale.

Beat-level analysis. The frame-level features are further considered per beat period. Each beat period is presumed to start with an energetic occurrence that marks the beat at a particular time instance. The beat-level analysis produces 47 sonic features per beat: (i) There are 7 beat-onset features, which describe the total loudness growth as well as the loudness growths in the six subbands at the beat onset time. (ii) There are 3 beat event features, which describe the exact position of the beat onset, the length of the beat event and the skewness of this event. (iii) There are 21 beat period features, which describe the seven frame-level loudnesses (the total loudness and the six subband loudnesses) observed in the course of the beat period following the beat event. Per loudness we first of all retain the mean and the standard deviation of the loudness samples, but we also consider the temporal evolution of the loudness samples in the beat period and we retain the center of gravity of this temporal pattern. (iv) There are 10 beat period features summarizing the information that is retrieved from the pitch saliences computed by the frame-level analysis. The first feature represents the position of the onset of the most salient note found in the beat period. The nine others describe the frequency (in Hz), the pitch class (chroma) and the salience of the three most salient notes. If only two notes are found, the third note is marked by a zero frequency and salience. (v) Finally, we compute 6 beat similarity features describing cosine similarities between subsets of the formerly derived beat onset and beat period features of each two subsequent beat onsets/periods. The considered feature subsets are: (1) the loudness growths in the six subbands at the beat onset, (2) the means of the six subband loudnesses in the beat period, (3) the standard deviations of the six subband loudnesses in the beat period, (4) the centers of gravity of the six subband loudness patterns, (5) the three most salient note frequencies found in the beat period, and (6) the same frequencies after mapping to the chromatic scale.

Song-level analysis. In the song-level analysis stage, we consider the beat-per-beat values of each individual beat-level feature as the samples of a ‘signal’ that was sampled at the beat rate. The aim of this analysis is to discover evidences for periodicities of lengths 2, 3, 4 or 6 in such a signal, and to consider these evidences as sonic features. The computed evidences for a particular signal are just the values of the amplitude spectrum of that signal at frequencies of one half, one third, one fourth and one sixth of the beat rate. This implies that every beat-level feature gives rise to four evidences, yielding 188 song-level features in total. These features, together with the outputs of two oscillators residing in the beat tracker, namely, the oscillators tuned to twice or three times the beat rate, complete the set of 190 sonic features characterizing the song.