$\begingroup$

After much investigation, simulation and a deep literature search, I've figured out the true answer.

You perceive a chirp because you are being hit with the echos of the sharp noise that generated the sound. The times between the arrival of those echos is decreasing inversely with time, so it sounds as if it were a tone with a fundamental frequency increasing linearly in time, hence the chirp.

To get a feel for the phenomenon, consider a simulation:

Above you see a slowed down version of the simulated pressure wave inside a 2D racquetball court. I threw up the generated sound on soundcloud.

If you watch the simulation, pick a particular point and watch the reflected sounds go by, you'll notice the different instances of the multiple echos arrive faster and faster as time goes on.

You can clearly hear the chirps in the generated sound, and if you listen closely you can hear secondary chirps as well. These are also visible in the spectrogram:

This phenomenon was studied and published recently by Kenji Kiyohara, Ken'ichi Furuya, and Yutaka Kaneda: "Sweeping echoes perceived in a regularly shaped reverberation room ," J. Acoust. Soc. Am. Vol.111, No.2, 925-930 (2002). more info

In particular, they explain not only the main sweep, but the appearance of the secondary sweeps using some number theory. Worth reading in full. This suggests that for the best sweep one should both stand and listen in the center of the room, though they should be generic at any location.

Simple geometric argument

Following the paper, we can give a simple geometric argument. If you imagine standing in the middle of a standard racquetball court, which is twice as long as it is tall or wide, and clap, your clap will start propagating and reflecting off the walls. A simple way to study the arrival times is with the method of images, so you imagine other claps generated by reflecting your clap across the walls, and then reflections of those claps and so on. This will generate a whole set of "image" claps, located at positions $$ ( m, l, 2k) L $$ where $m,l,k$ are integers and $L$ is 20 feet for a racquetball court, the time for any particular clap to reach you is $t = d/c$ and so we have $$ t = \sqrt{m^2 + l^2 + 4k^2} \frac{L}{c} $$ for our arrival times. If we look at how these distribute in time:

It becomes clear why we perceive a chirp. The various sets of missing bars, which themselves are spaced like a chirp, give rise to our perceived subchirps.

Details of the 2D Simulation

For the simulation, I numerically solved the wave equation: $$ \frac{\partial^2 p}{dt^2} = c^2

abla^2 p $$ and used impedance boundary conditions on the walls $$

abla p \cdot \hat n = -c \eta \frac{\partial p}{\partial t} $$ I used a collocation method spatially, with a Chebyshev basis of order 64 in the short axis and 128 on the long axis. and used RK4 for the time integration.

I modeled the room as 20 feet by 40 feet and started it of with a gaussian pressure pulse in one corner of the room. I listened near the back wall towards the top corner.

I put up an ipython notebook of my code, with the embedded audio and video. I recommend playing with it yourself. On my desktop it takes about minute to do a full simulation of the sound.

Effect of listening location

I've updated the code to generate sound at multiple locations, and generate their sounds. I can't seem to embed audio on stackexchange, but if you click through to the IPython notebook view, you can listen to all of the generated sounds. But what I can do here is show the spectrograms:

These are laid out in roughly their locations inside of the room. Here the noise was generated in the lower left, but the chirps should be generic for any listening and generation location.