If you are looking to synthesize the sound of a plucked string, there is an amazingly simple algorithm for doing so called the Karplus-Strong Algorithm.

Give it a listen: KarplusStrong.wav

Here it is with flange and reverb effects applied: KPFlangeReverb.wav

It works like this:

Fill a circular buffer with static (random numbers) Play the contents of the circular buffer over and over Each time you play a sample, replace that sample with the average of itself and the next sample in the buffer. Also multiplying that average by a feedback value (like say, 0.996)

Amazingly, that is all there is to it!

Why Does That Work?!

The reason this works is that it is actually very similar to how a real guitar string pluck works.

When you pluck a guitar string, if you had a perfect pluck at the perfect location with the perfect transfer of energy, you’d get a note that was “perfect”. It wouldn’t be a pure sine wave since strings have harmonics (integer multiple frequencies) beyond their basic tuning, but it would be a pure note.

In reality, that isn’t what happens, so immediately after plucking the string, there is a lot of vibrations in there that “don’t belong” due to the imperfect pluck. Since the string is tuned, it wants to be vibrating a specific way, so over time the vibrations evolve from the imperfect pluck vibrations to the tuning of the guitar string. As you average the samples together, you are removing the higher frequency noise/imperfections. Averaging is a crude low pass filter. This makes it converge to the right frequencies over time.

It’s also important to note that with a real stringed instrument, when you play a note, the high frequencies disappear before the low frequencies. This averaging / low pass filter makes that happen as well and is part of what helps it sound so realistic.

Also while all that is going on, the energy in the string is being diminished as it becomes heat and sound waves and such, so the noise gets quieter over time. When you multiply the values by a feedback value which is less than 1, you are simulating this loss of energy by making the values get smaller over time.

Tuning The Note

This wasn’t intuitive for me at first, but the frequency that the note plays at is determined ENTIRELY by the size of the circular buffer.

If your audio has a sample rate of 44100hz (44100 samples played a second), and you use this algorithm with a buffer size of 200 samples, that means that the note synthesized will be 220.5hz. This is because 44100/200 = 220.5.

Thinking about the math from another direction, we can figure out what our buffer size needs to be for a specific frequency. If our sample rate is 44100hz and we want to play a note at 440hz, that means we need a buffer size of 100.23 samples. This is because 44100/440 = 100.23. Since we can’t have a fractional number of samples, we can just round to 100.

You can actually deal with the fractional buffer size by stepping through the ring buffer in non integer steps and using the fraction to interpolate audio samples, but I’ll leave that as an exercise for you if you want that perfectly tuned note. IMO leaving it slightly off could actually be a good thing. What guitar is ever perfectly in tune, right?! With it being slightly out of tune, it’s more likely to make more realistic sounds and sound interactions when paired with other instruments.

You are probably wondering like I was, why the buffer size affects the frequency of the note. The reason for this is actually pretty simple and intuitive after all.

The reason is because the definition of frequency is just how many times a wave form repeats per second. The wave form could be a sine wave, a square wave, a triangle wave, or it could be something more complex, but frequency is always the number of repetitions per second. If you think about our ring buffer as being a wave form, you can now see that if we have a buffer size of 200 samples, and a sample rate of 44100hz, when we play that buffer continually, it’s going to play back 220.5 times every second, which means it will play with a frequency of 220.5!

Sure, we modify the buffer (and waveform) as we play it, but the modifications are small, so the waveform is similar from play to play.

Some More Details

I’ve found that this algorithm doesn’t work as well with low frequency notes as it does with high frequency notes.

They say you can prime the buffer with a saw tooth wave (or other wave forms) instead of static (noise). While it still “kind of works”, in my experimentation, it didn’t work out that well.

You could try using other low pass filters to see if that affects the quality of the note generated. The simple averaging method works so well, I didn’t explore alternative options very much.

Kmm on hacker news commented that averaging the current sample with the last and next, instead of just the next had the benefit that the wave form didn’t move forward half a step each play through and that there is an audible difference between the techniques. I gave it a try and sure enough, there is an audible difference, the sound is less harsh on the ears. I believe this is so because averaging 3 samples instead of 2 is a stronger low pass filter, so gets rid of higher frequencies faster.

Example Code

Here is the C++ code that generated the sample at the top of the post. Now that you can generate plucked string sounds, you can add some distortion, flange, reverb, etc and make some sweet (synthesized) metal without having to learn to play guitar and build up finger calluses 😛

#include <stdio.h> #include <memory.h> #include <inttypes.h> #include <vector> // constants const float c_pi = 3.14159265359f; const float c_twoPi = 2.0f * c_pi; // typedefs typedef uint16_t uint16; typedef uint32_t uint32; typedef int16_t int16; typedef int32_t int32; //this struct is the minimal required header data for a wav file struct SMinimalWaveFileHeader { //the main chunk unsigned char m_chunkID[4]; uint32 m_chunkSize; unsigned char m_format[4]; //sub chunk 1 "fmt " unsigned char m_subChunk1ID[4]; uint32 m_subChunk1Size; uint16 m_audioFormat; uint16 m_numChannels; uint32 m_sampleRate; uint32 m_byteRate; uint16 m_blockAlign; uint16 m_bitsPerSample; //sub chunk 2 "data" unsigned char m_subChunk2ID[4]; uint32 m_subChunk2Size; //then comes the data! }; //this writes template <typename T> bool WriteWaveFile(const char *fileName, std::vector<T> data, int16 numChannels, int32 sampleRate) { int32 dataSize = data.size() * sizeof(T); int32 bitsPerSample = sizeof(T) * 8; //open the file if we can FILE *File = nullptr; fopen_s(&File, fileName, "w+b"); if (!File) return false; SMinimalWaveFileHeader waveHeader; //fill out the main chunk memcpy(waveHeader.m_chunkID, "RIFF", 4); waveHeader.m_chunkSize = dataSize + 36; memcpy(waveHeader.m_format, "WAVE", 4); //fill out sub chunk 1 "fmt " memcpy(waveHeader.m_subChunk1ID, "fmt ", 4); waveHeader.m_subChunk1Size = 16; waveHeader.m_audioFormat = 1; waveHeader.m_numChannels = numChannels; waveHeader.m_sampleRate = sampleRate; waveHeader.m_byteRate = sampleRate * numChannels * bitsPerSample / 8; waveHeader.m_blockAlign = numChannels * bitsPerSample / 8; waveHeader.m_bitsPerSample = bitsPerSample; //fill out sub chunk 2 "data" memcpy(waveHeader.m_subChunk2ID, "data", 4); waveHeader.m_subChunk2Size = dataSize; //write the header fwrite(&waveHeader, sizeof(SMinimalWaveFileHeader), 1, File); //write the wave data itself fwrite(&data[0], dataSize, 1, File); //close the file and return success fclose(File); return true; } template <typename T> void ConvertFloatSamples (const std::vector<float>& in, std::vector<T>& out) { // make our out samples the right size out.resize(in.size()); // convert in format to out format ! for (size_t i = 0, c = in.size(); i < c; ++i) { float v = in[i]; if (v < 0.0f) v *= -float(std::numeric_limits<T>::lowest()); else v *= float(std::numeric_limits<T>::max()); out[i] = T(v); } } //calculate the frequency of the specified note. //fractional notes allowed! float CalcFrequency(float octave, float note) /* Calculate the frequency of any note! frequency = 440×(2^(n/12)) N=0 is A4 N=1 is A#4 etc... notes go like so... 0 = A 1 = A# 2 = B 3 = C 4 = C# 5 = D 6 = D# 7 = E 8 = F 9 = F# 10 = G 11 = G# */ { return (float)(440 * pow(2.0, ((double)((octave - 4) * 12 + note)) / 12.0)); } class CKarplusStrongStringPluck { public: CKarplusStrongStringPluck (float frequency, float sampleRate, float feedback) { m_buffer.resize(uint32(float(sampleRate) / frequency)); for (size_t i = 0, c = m_buffer.size(); i < c; ++i) { m_buffer[i] = ((float)rand()) / ((float)RAND_MAX) * 2.0f - 1.0f; // noise //m_buffer[i] = float(i) / float(c); // saw wave } m_index = 0; m_feedback = feedback; } float GenerateSample () { // get our sample to return float ret = m_buffer[m_index]; // low pass filter (average) some samples float value = (m_buffer[m_index] + m_buffer[(m_index + 1) % m_buffer.size()]) * 0.5f * m_feedback; m_buffer[m_index] = value; // move to the next sample m_index = (m_index + 1) % m_buffer.size(); // return the sample from the buffer return ret; } private: std::vector<float> m_buffer; size_t m_index; float m_feedback; }; void GenerateSamples (std::vector<float>& samples, int sampleRate) { std::vector<CKarplusStrongStringPluck> notes; enum ESongMode { e_twinkleTwinkle, e_strum }; int timeBegin = 0; ESongMode mode = e_twinkleTwinkle; for (int index = 0, numSamples = samples.size(); index < numSamples; ++index) { switch (mode) { case e_twinkleTwinkle: { const int c_noteTime = sampleRate / 2; int time = index - timeBegin; // if we should start a new note if (time % c_noteTime == 0) { int note = time / c_noteTime; switch (note) { case 0: case 1: { notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 0), float(sampleRate), 0.996f)); break; } case 2: case 3: { notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 7), float(sampleRate), 0.996f)); break; } case 4: case 5: { notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 9), float(sampleRate), 0.996f)); break; } case 6: { notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 7), float(sampleRate), 0.996f)); break; } case 7: { mode = e_strum; timeBegin = index+1; break; } } } break; } case e_strum: { const int c_noteTime = sampleRate / 32; int time = index - timeBegin - sampleRate; // if we should start a new note if (time % c_noteTime == 0) { int note = time / c_noteTime; switch (note) { case 0: notes.push_back(CKarplusStrongStringPluck(55.0f, float(sampleRate), 0.996f)); break; case 1: notes.push_back(CKarplusStrongStringPluck(55.0f + 110.0f, float(sampleRate), 0.996f)); break; case 2: notes.push_back(CKarplusStrongStringPluck(55.0f + 220.0f, float(sampleRate), 0.996f)); break; case 3: notes.push_back(CKarplusStrongStringPluck(55.0f + 330.0f, float(sampleRate), 0.996f)); break; case 4: mode = e_strum; timeBegin = index + 1; break; } } break; } } // generate and mix our samples from our notes samples[index] = 0; for (CKarplusStrongStringPluck& note : notes) samples[index] += note.GenerateSample(); // to keep from clipping samples[index] *= 0.5f; } } //the entry point of our application int main(int argc, char **argv) { // sound format parameters const int c_sampleRate = 44100; const int c_numSeconds = 9; const int c_numChannels = 1; const int c_numSamples = c_sampleRate * c_numChannels * c_numSeconds; // make space for our samples std::vector<float> samples; samples.resize(c_numSamples); // generate samples GenerateSamples(samples, c_sampleRate); // convert from float to the final format std::vector<int32> samplesInt; ConvertFloatSamples(samples, samplesInt); // write our samples to a wave file WriteWaveFile("out.wav", samplesInt, c_numChannels, c_sampleRate); }

Links

Hacker News Discussion (This got up to topic #7, woo!)

Wikipedia: Karplus-Strong String Synthesis

Princeton COS 126: Plucking a Guitar String

Shadertoy: Karplus-Strong Variation (Audio) – I tried to make a bufferless Karplus-Strong implementation on shadertoy. It didn’t quite work out but is still a bit interesting.