Abstract Long-range correlated temporal fluctuations in the beats of musical rhythms are an inevitable consequence of human action. According to recent studies, such fluctuations also lead to a favored listening experience. The scaling laws of amplitude variations in rhythms, however, are widely unknown. Here we use highly sensitive onset detection and time series analysis to study the amplitude and temporal fluctuations of Jeff Porcaro’s one-handed hi-hat pattern in “I Keep Forgettin’”—one of the most renowned 16th note patterns in modern drumming. We show that fluctuations of hi-hat amplitudes and interbeat intervals (times between hits) have clear long-range correlations and short-range anticorrelations separated by a characteristic time scale. In addition, we detect subtle features in Porcaro’s drumming such as small drifts in the 16th note pulse and non-trivial periodic two-bar patterns in both hi-hat amplitudes and intervals. Through this investigation we introduce a step towards statistical studies of the 20th and 21st century music recordings in the framework of complex systems. Our analysis has direct applications to the development of drum machines and to drumming pedagogy.

Citation: Räsänen E, Pulkkinen O, Virtanen T, Zollner M, Hennig H (2015) Fluctuations of Hi-Hat Timing and Dynamics in a Virtuoso Drum Track of a Popular Music Recording. PLoS ONE 10(6): e0127902. https://doi.org/10.1371/journal.pone.0127902 Academic Editor: Ramesh Balasubramaniam, University of California, Merced, UNITED STATES Received: November 14, 2014; Accepted: April 20, 2015; Published: June 3, 2015 Copyright: © 2015 Räsänen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Data Availability: All relevant data are within the paper and its Supporting Information files. Funding: Tuomas Virtanen has been funded by the Academy of Finland, grant number 258708. Competing interests: The authors have declared that no competing interests exist.

Introduction Astonishingly many dynamical or complex systems in various branches of physics, biology, and economics show 1/f fluctuations [1] often called as fractal. 1/f-type noise is present even in the most obvious human-generated time-series such as heart beat [2–4], gait [5], and tapping or drumming [6–10]. In an early study, Voss [11] found that the musical pitch and loudness follow 1/f fluctuations. Loudness fluctuations were studied by analyzing, e.g., a recording of Bach’s 1st Brandenburg Concerto. In this case, however, the fluctuations were taken from the full audio signal in a ‘continuous’ sense. Later on, fractal analysis of loudness variations has been used to classify genres and styles of music [12–14]. Fractal analysis on human musical rhythms has been done only very recently [6, 7]. Remarkably, clear long-range correlated (LRC) fluctuations were consistently found in various rhythmic tasks, albeit often outside the 1/f regime (see below for the mathematical definitions). Another important finding in the subsequent perception study was the fact that the listeners had a statistically significant preference for ‘1/f humanized’ samples over ‘white-noise humanized’ samples. Furthermore, it was shown very recently that rhythms between individuals are subject to scale-free cross correlations [15]. These findings underline the subtlety of a rhythmic interplay in musical performances and in their perception. Despite the advances in the quantification of human rhythms, the statistical (fractal) properties of rhythmic fluctuations in the 20th and 21st century music recordings have not been analyzed in detail. It should be noted that in previous studies on tapping and drumming [1, 6, 8, 9, 15] the experiments were either (i) conducted in a ‘clean’ laboratory environment or (ii) individual drumming tracks were used where a metronome was present. A metronome leads to a constant pace and defines a constant grid for audio engineers. However, it qualitatively changes the behavior on both short and long time scales [1, 10, 15]. In other studies, classical piano music (without drums or metronome) was subject to fractal analysis, where clear signatures of 1/f tempo fluctuations were found [16, 17]. Here we take the first step to fill the gap between carefully designed experimental studies under controlled conditions [8] and recorded drumming under real-world conditions. In the latter case, further studies are needed to determine and classify fluctuations in both interbeat intervals and beat amplitudes. To the best of our knowledge, the correlation properties of amplitude (i.e., loudness) fluctuations of beats in rhythms have not been scrutinized as yet. Moreover, it is worth studying whether “human” fluctuations contribute to the groove—sometimes defined as the subjective experience of wanting to move rhythmically when listening to music [8, 18–24]. Previously, it has been found that microtiming deviations without LRCs do not affect the listener groove ratings [19, 20], or even correlate negatively with them [23, 24]. On the other hand, groove ratings can be changed with other aspects in the rhythmic structure, e.g., with syncopation [21, 22]. Here we focus on timing and loudness variations that occur naturally when a drummer plays to a piece of music, and suggest that they may also contribute to the groove. However, we do not provide an exhaustive treatment of groove from a musicological point of view. In this work, we thoroughly analyze a one-handed hi-hat drumming pattern of a musical masterpiece recorded in 1982 [25]. We use sensitive signal-analysis tools to detect the onset times of hi-hat hits of the song with a millisecond accuracy. The onset of a hit is defined as the time when the hit begins. Once the onsets are detected, we carry out time-series analysis on the sequence of onset times. Firstly, we examine the drift of the 16th note pulse that strongly correlates with the parts of the song and shows that the drum track was recorded without a metronome. Secondly, the fluctuations of the 16th note hi-hat intervals as well as hit amplitudes are subjected to detrended fluctuation analysis (DFA) and power-spectrum study, which clearly show the existence of LRCs in both cases. However, a Poincaré plot of the interval variability [i:th interval versus (i+1):th interval] shows strong lag-1 anticorrelations. This suggests motor delays in the 16th note hi-hat pulse in accordance with previous behavioral data and models. Finally, we demonstrate that each repetitive phrase of the song, consisting of two bars and 32 hi-hat hits, has a specific amplitude pattern. Also the hit intervals show positive correlations across phrases. Interestingly, the phrase—as defined above—seems to correspond to the time scale that separates the LRCs (at longer times) and anticorrelations (at shorter times). The paper is concluded with a discussion on implications and possible follow-ups of the present study.

Materials and Methods Object of study In our analysis, we focus on one song, I Keep Forgettin’ by Michael McDonald recorded in 1982 [25]. It is a low-mid tempo (96 quarter-note beats per minute) pop-soul song with a well-known 16th note hi-hat drum pattern played by Jeff Porcaro [26]. Jeff Porcaro (1954–1992) was one of the most renowned drummers of his time; a session musician behind recordings of, e.g., Michael Jackson and Madonna, and a member of major rock bands Steely Dan and TOTO. One of Porcaro’s trademarks was his single-handed hi-hat technique that he used to play 16th note patterns with a particularly smooth and groovy feel [27]. I Keep Forgettin’ features this technique in its most recognizable form. In his instructional drumming video Porcaro comments on his hi-hat playing in this song [27]: “I like the single-handed method, because it’s a lot smoother feel. For instance in the Michael McDonald record ‘I Keep Forgettin”, I tried doing the alternating stroke method of doing 16ths, and it sounded just too stiff and staccato for me.” The comment makes an intriguing starting point for the present study from a musicological point of view. The results below reflect Porcaro’s comment in the sense that there is a smooth and subtle modulation in his single-handed hi-hat playing. It is commonly agreed by drummers that, e.g., modulations in hi-hat accents are important in the generation of the “groove”, and Jeff Porcaro is highly respected for this ability. In addition, we find LRCs in both interval and amplitude variations. To find out whether LRCs exist also in two-handed patterns is, however, a subject of future studies. From a physical and mathematical point of view, the selected song is well suited for quantitative analysis for the following reasons. First, the large number of onsets in hi-hats played on the 16th notes allows sufficiently reliable fractal analysis with DFA. Secondly, the song is strongly driven by drums and bass that dominate the instrumentation in most parts of the recording. This helps the precise determination of the onset times. In general, hi-hats suit well for onset analysis due to their high frequency range as shown below (Fig 1). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Upper panel: Audio signal of a short clip of the song presented as a spectrogram. The bright branches at high frequencies correspond to the hi-hat beats. Lower panel: Cross section of the spectrogram with the envelope, amplitude threshold (dashed line), and the detected onset times (crosses). https://doi.org/10.1371/journal.pone.0127902.g001 Onset analysis In the original recording, all instruments are mixed together. To select a specific component from the complete song, here the hi-hat hits, we use frequency filters and semi-automated sensitive onset detection. The onset times of hi-hats are obtained by first applying a computational onset detection algorithm on the audio signal, and then manually editing the onset positions. The original audio signal is an uncompressed stereo WAV file extracted from the original compact disc [25] having sampling frequency of 44.1 kHz and 16 bits per sample. There exist established algorithms for onset detection of musical sounds [28]. In this study we are interested in the onsets of hi-hats only, and therefore generic onset estimation algorithms are not applicable for our purpose. Instead, we implemented an onset detection algorithm for hi-hats in MATLAB [29]. The main challenge in the onset analysis is the polytimbral nature of the material: the signal is a mixdown of multiple instruments that overlap with each other in time and frequency. Hi-hats have most of their energy at high frequencies, whereas most of the other instruments are dominated by low frequencies. In order to make the hi-hat sounds more prominent for the subsequent onset estimation, the signal was first filtered with a 100th-order FIR filter with a cutoff frequency of 8 kHz. The delay caused by the filter was compensated by shifting the signal. Onsets are most clearly visible in the amplitude envelope of the signal as shown in Fig 1. In the automatic onset analysis, the envelope of the filtered signal is calculated by finding the maximum of the absolute value of the signal within a 200-sample (4.5 ms) window centered at each sample. Hi-hat instances are found as the local maxima of the envelope, higher than a threshold that was manually tuned not to discard any real hi-hat instances. The onset time of each hi-hat instance is found by examining a 1500-sample (34 ms) window before each hi-hat instance time, assuming that the hi-hat sound starts at most 34 ms before its maximum amplitude. The onset time is defined to be the time when the amplitude of the envelope rises above 10% of the maximum amplitude of the instance. This percent method has successfully been used to extract onset times of other types of instruments as well [30]. The above method works well for estimating onsets when no other instruments were present. An example of successful onset analysis based on the approach described above is illustrated in Fig 1. However, interference from other instruments may rise the general level of the envelope above 10% of the maximum. In this case, the threshold was doubled until a rise from below to above the envelope was found. Sounds produced by other instruments can produce erroneous onset estimates. Here we use an automatic constraint that the interval between true hi-hat onsets can deviate at most ±20 ms from 157 ms, which is the average interval of the 16th notes. However, there are cases when instruments (mostly snare drums and cymbals) occur simultaneously with a hi-hat sound in the same frequency range and make the determination of the exact onset time impossible. In those cases the onsets were omitted from further analysis. Finally, the onset candidates were manually examined to confirm their correctness. First, the highpass-filtered signal and estimated onset times were visually examined in an audio editor while listening to the original and filtered signals. Second, a “click” track was produced by generating a synthesized click sound at the estimated onset times. The original and the click track were listened alternately to spot any instances where the perceived onset times differed from each other. As a result of the examination, the onset times were manually changed to match with the perceived hi-hat onset times. The above methods were used in small segments of the signal at a time, and each segment was examined multiple times to verify the correctness of the onset times. In total we detected 931 hi-hat onsets (see S1 Dataset). All the onsets were used for the analysis of amplitudes below. For the analysis of hi-hat interbeat intervals, however, we included only the clearly detected 16th note intervals. Therefore, we needed to omit the intervals having a missing onset (or many of them) in between. Also the 8th note intervals with an open hi-hat, often played at the end of the phrase consisting of two bars (see below), were omitted. The total number of detected 16th note hi-hat intervals is thus 708, leading to a considerably large detection rate of 76% in the intervals with respect to the total number of onsets. Detrended fluctuation analysis DFA is a widely used method in time-series analysis to study long-range correlations, particularly the 1/f noise [31]. Several studies over the past 20 years have shown the usefulness of DFA to determine fractal properties of non-stationary time series [6, 13, 31–33]. Outside the time domain, it has been used to study, e.g., DNA structures [34], and very recently also magnetoconductance of chaotic quantum dots by some of the present authors [35]. The reliability of DFA against alternative methods to determine fractal properties has been quantitatively confirmed by Pilgram and Kaplan [36]. The 1/f noise essentially means that the power spectrum of a signal f(i) is of a power-law form S(f) ∼ 1/fβ with β ∼ 1. This is often referred to as pink or flicker noise that has intermediate predictability between (i) white noise with β = 0 and no correlation between consecutive values and (ii) Brownian motion with β = 2 and strongly correlated values generated by uncorrelated consecutive increments[1]. In this context, 1/f fluctuations are often called fractal, for β corresponds to the self-similarity parameter (Hurst exponent) α, which describes the temporal scaling of a signal X(t) in a statistical sense: X(bt) = bα X(t), where b is a scaling factor. In turn, α corresponds also to the exponent in DFA (see below). In the DFA context, α and β are related as β = 2α−1 [37], and fluctuations leading to 0.5 < α ≤ 1.5 are generally referred to as long-range correlated (LRC). Anticorrelations are present for −0.5 < α < 0.5. Generally, we speak of the 1/f regime when α = β = 1 within statistical errors. We apply DFA to (i) the fluctuations of the interbeat intervals (from the mean) and (ii) the fluctuations of the onset amplitudes. In the following we exemplify the conventional DFA procedure [31, 32] for the former case. In the notation we partly follow Ref. [38], where DFA was applied to rainfall and streamflow data. The onset times are denoted by f(i), so that the set of interbeat intervals becomes τ(i) = f(i+1)−f(i). Next, we subtract the mean of the intervals ⟨τ⟩ to obtain a set of the fluctuations of the intervals from the mean, i.e., Δτ(i) = τ(i)−⟨τ⟩. Our interest lies now in the (possible) LRCs in Δτ(i). To this end, we first integrate the series by calculating a function (1) where N is the number of data points. Next, we divide the i axis into N/s non-overlapping windows each consisting of s data points. In each window, a least-squares line y s (i)—that represents the trend in the window – is fit to y(i) and the residuals y(i)−y s (i) are calculated (detrending). Thus, we use linear detrending; quadratic (or higher order) detrending did not lead to a qualitative difference. The root-mean-square fluctuations for a window of size s are calculated by (2) Finally, we take a mean value over all N/s elements of F k (s) to obtain F(s) = ⟨F k (s)⟩. The procedure thus yields a relationship between the average fluctuation within a certain window size and the window size itself. We can now examine whether F(s) scales as F(s) ∝ sα, where the scaling (DFA) exponent α is the slope of the line relating logF(s) to logs. The white noise and the Brown noise (integrated white noise) correspond to α = 0.5 and α = 1.5, respectively, whereas 0.5 < α ≤ 1.5 indicates LRC, and the special case of flicker noise α = 1 corresponds to 1/f behavior. In this work, DFA results are supplemented by the globally detrended power spectral density (gPSD) analysis described in detail in Ref. [15]. It is a modification of the conventional PSD method and includes prior detrending with higher-order polynomials (beyond linear global detrending). Higher-order polynomial detrending has proven to be crucial when nonstationary time series—here, recordings without the metronome—are analyzed. This is expected to be even more important when real-world recordings are studied as in the present work. We point out that DFA as well as gPSD are subject to intrinsic errors analyzed in detail by Pilgram and Kaplan [36]. These errors in the estimate of the scaling exponent do not include the numerical error of the least-square fitting procedure. In practice we expect the errors for our data sets to be below ∼ 10%, which does not lead to a qualitative difference in the interpretation of the results.

Discussion In summary, we have described a route to examine hi-hat patterns in real world data. In particular, we have analyzed the 16th note hi-hat intervals and amplitudes played by Jeff Porcaro in I Keep Forgettin’, which he plays in his unique one-handed manner. We have first generated the time-series using sensitive onset detection to one millisecond precision in the complete sound file. Then we have analyzed the drift of the sixteenth note pulse, long-range correlations (LRCs) with detrended fluctuation analysis (DFA) and spectral analysis, Poincaré maps, and finally variations on the level of one and two bars (phrase) of the song. Our results show that the drum track was most likely recorded without the metronome, and the slight changes in the 16th note pulse reflect different parts of the song. Clear evidence of LRC fluctuations in 16th note hi-hat intervals was found. To the best of our knowledge, this phenomenon has not been found in recorded drumming in popular music before, when no metronome was present during the recording, and when no individual drum tracks were available. The LRCs seem to wash away in short time scales, likely due to motor delays studied before in human cognition. The observed anticorrelations on a small time scale, including lag-1 anticorrelations clearly visible in Poincaré return maps, are consistent with previous studies [1, 6, 9, 15]. The amplitudes also show LRC fluctuations, albeit weaker than in the case of hi-hat intervals. Our analysis of individual bars reveals complex patterns in both interbeat intervals and amplitudes. In particular, the two-bar phrase of the song is characterized by a rich amplitude pattern that goes beyond the 8th note accenting (on the every second 16th note). Our study can be taken as the first step to analyze the complex dynamics of music recordings of the 20th and 21st century on different time scales. Several important questions arise, and in the following we mention only a few of them. First and foremost, a detailed study on LRCs in more songs, preferably of a large ensemble, would provide new insights on the nature of human timing and its relation to groove and perception of time. This may complement previous milestone studies in cognitive sciences and musicology. At present, we are constrained by the fact that the onset detection from an analog signal is a tedious task. On the other hand, available MIDI recordings that we examined so far display machine-generated (or manipulated) drum tracks without any interval fluctuations, which has been demonstrated to worsen the listening experience [6]. Secondly, to learn more about the groove of iconic musicians such as Jeff Porcaro, and about the universality of the considered phenomena, it would be important to compare (i) different recordings of the same drummer, (ii) different drummers, (ii) various tempos or/and rhythms, (iii) different musical genres, and (iv) playing styles, e.g., single-handed 16th note hi-hats versus the more common two-handed patterns. The last point is already under our examination. Furthermore, regarding the musical groove a major factor is the communication between players, e.g., between the drummer and the bass player [15]. In a more general context, a key question that still needs to be addressed is the origin of the LRCs, which are common in a variety of systems. The underlying system must be sufficiently complex, described by a nonlinear differential equation (or many of them), and there must be a proper amount of feedback. However, the origins might largely variate from system to system, and it is difficult to generate universal models that could qualitatively describe, e.g., heart-beat intervals, magnetoconductance oscillations, and drumming intervals in the same footing. Finally we comment on possible practical applications of the present study. First, the complex (but repetitive) hi-hat patterns found here [see, e.g., Fig 6] could be implemented into drum machines in a straightforward manner to improve their “human touch”. This should be combined with LRCs that already have been subject to such proposals. [6]. Secondly, although we currently miss a comparison to two-handed hi-hat patterns, according to our results it is likely that the “single-handed method” (in Porcaro’s words [27]) is superior to the two-handed playing in order to enrich the rhythm and the feel, depending naturally on the drummer and his/her qualities. This fact should be underlined in modern pop and rock drumming pedagogy.

Supporting Information S1 Dataset. Detected 16th note hi-hat onsets. The file contains all the detected onsets in “I Keep Forgettin”’. The onset times (in seconds) and the corresponding amplitudes (in arbitrary units) are given in the first and second column, respectively. https://doi.org/10.1371/journal.pone.0127902.s001 (TXT)

Acknowledgments We thank Tauno Räsänen at Oulu Conservatory of Music for his professional insights into drumming that helped us in the preparation of the manuscript. We are also grateful to Oguzhan Gencoglu, Eemi Fagerlund, Tuomas Eerola, Carlo Rozzi, Topi Karilainen, Perttu Luukko, Ilkka Kylänpää, Janne Solanpää, and several other people for useful advice, comments, and discussions, and to TUT Prof Experience[46] for special inspiration. Tuomas Virtanen has been funded by the Academy of Finland, grant number 258708.

Author Contributions Conceived and designed the experiments: ER HH. Performed the experiments: ER OP TV. Analyzed the data: ER OP HH. Contributed reagents/materials/analysis tools: ER OP TV MZ HH. Wrote the paper: ER OP TV HH.