Monophonic pitch detection with the Harmonic Product Spectrum (HPS) algorithm¶

For musical signals the spectrum consists of a series of peaks corresponding to a fundamental frequency with harmonic components, called partials, positioned at integer multiples of the fundamental. Thus, by downsampling a spectrum several times the strongest harmonic peaks should line up.

For example, by downsampling a spectrum by a factor of two, the second peak in the resulting spectrum will line up with the fundamental. Likewise, the third peak in a spectrum compressed by a factor of three will line up with the fundamental, and so on. Therefore, if the spectra are multiplied per bin, each product will be small for all frequencies except at the position that corresponds to the fundamental frequency. This downsampling technique could therefore be used to estimate pitch in monophonic musical signals, and this is the key idea of HPS.

Two caveats are that we're relying on instruments to have harmonic spectrums. This is not always the case. For example, inharmonicity caused by string tension means some instruments have displaced partials over a slightly stretched harmonic series. Piano particularly so, due to very high string tension. Similarly, the human voice doesn't follow a particular harmonic series, but rather has formants that cause us to perceive different vowels. Additionally, certain instruments have vastly different magnitudes at specific partial intervals, such as the clarinet or saxophone, which is what makes up their particular acoustic sound, or timbre.

Thus, for HPS to work, we're relying on all partial's magnitudes to be non-zero, because otherwise the product turns very small and the resulting peak might not be clearly spotted. So as a monophonic pitch detection method, HPS should work well for violins and worse for vocals and piano, for which we'd might want to look to autocorrelation instead.