$\begingroup$

The first thing you have to understand is that notes are not uniquely defined. Everything depends on what tuning you use. I'll assume we're talking about equal temperament here. In equal temperament, a half-step is the same as a frequency ratio of $\sqrt[12]{2}$; that way, twelve half-steps makes up an octave. Why twelve?

At the end of the day, what we want out of our musical frequencies are nice ratios of small lintegers. For example, a perfect fifth is supposed to correspond to a frequency ratio of $3 : 2$, or $1.5 : 1$, but in equal temperament it doesn't; instead, it corresponds to a ratio of $2^{ \frac{7}{12} } : 1 \approx 1.498 : 1$. As you can see, this is not a fifth; however, it is quite close.

Similarly, a perfect fourth is supposed to correspond to a frequency ratio of $4 : 3$, or $1.333... : 1$, but in equal temperament it corresponds to a ratio of $2^{ \frac{5}{12} } : 1 \approx 1.335 : 1$. Again, this is not a perfect fourth, but is quite close.

And so on. What's going on here is a massively convenient mathematical coincidence: several of the powers of $\sqrt[12]{2}$ happen to be good approximations to ratios of small integers, and there are enough of these to play Western music.

Here's how this coincidence works. You get the white keys from $C$ using (part of) the circle of fifths. Start with $C$ and go up a fifth to get $G$, then $D$, then $A$, then $E$, then $B$. Then go down a fifth to get $F$. These are the "neighbors" of $C$ in the circle of fifths. You get the black keys from here using the rest of the circle of fifths. After you've gone up a "perfect" perfect fifth twelve times, you get a frequency ratio of $3^{12} : 2^{12} \approx 129.7 : 1$. This happens to be rather close to $2^7 : 1$, or seven octaves! And if we replace $3 : 2$ by $2^{ \frac{7}{12} } : 1$, then we get exactly seven octaves. In other words, the reason you can afford to identify these intervals is because $3^{12}$ happens to be rather close to $2^{19}$. Said another way,

$$\log_2 3 \approx \frac{19}{12}$$

happens to be a good rational approximation, and this is the main basis of equal temperament. (The other main coincidence here is that $\log_2 \frac{5}{4} \approx \frac{4}{12}$; this is what allows us to squeeze major thirds into equal temperament as well.)

It is a fundamental fact of mathematics that $\log_2 3$ is irrational, so it is impossible for any kind of equal temperament to have "perfect" perfect fifths regardless of how many notes you use. However, you can write down good rational approximations by looking at the continued fraction of $\log_2 3$ and writing down convergents, and these will correspond to equal-tempered scales with more notes.

Of course, you can use other types of temperament, such as well temperament; if you stick to $12$ notes (which not everybody does!), you will be forced to make some intervals sound better and some intervals sound worse. In particular, if you don't use equal temperament then different keys sound different. This is a major reason many Western composers composed in different keys; during their time, this actually made a difference. As a result when you're playing certain sufficiently old pieces you aren't actually playing them as they were intended to be heard - you're using the wrong tuning.

Edit: I suppose it is also good to say something about why we care about frequency ratios which are ratios of small integers. This has to do with the physics of sound, and I'm not particularly knowledgeable here, but this is my understanding of the situation.

You probably know that sound is a wave. More precisely, sound is a longitudinal wave carried by air molecules. You might think that there is a simple equation for the sound created by a single note, perhaps $\sin 2\pi f t$ if the corresponding tone has frequency $f$. Actually this only occurs for tones which are produced electronically; any tone you produce in nature carries with it overtones and has a Fourier series

$$\sum \left( a_n \sin 2 \pi n f t + b_n \cos 2 \pi n f t \right)$$

where the coefficients $a_n, b_n$ determine the timbre of the sound; this is why different instruments sound different even when they play the same notes, and has to do with the physics of vibration, which I don't understand too well. So any tone which you hear at frequency $f$ almost certainly also has components at frequency $2f, 3f, 4f, ...$.

If you play two notes of frequencies $f, f'$ together, then the resulting sound corresponds to what you get when you add their Fourier series. Now it's not hard to see that if $\frac{f}{f'}$ is a ratio of small integers, then many (but not all) of the overtones will match in frequency with each other; the result sounds a more complex note with certain overtones. Otherwise, you get dissonance as you hear both types of overtones simultaneously and their frequencies will be similar, but not similar enough.

Edit: You should probably check out David Benson's "Music: A Mathematical Offering", the book Rahul Narain recommended in the comments for the full story. There was a lot I didn't know, and I'm only in the introduction!