When initializing the weight matrix (let's assume there is only one) in an RNN (recurrent neural network) it is said (e.g. by Ilya Sutskever in his PhD thesis) that you want the spectral radius (the size of the largest eigenvalue in absolute value) to be slightly less than $1$.

and then play with the variance until it works.

In this post we'll do a bit of exploratory mathematics to prod the how the variance and size of a matrix, $W$, affects its spectral radius $\rho(W)$.

The distribution of the spectral radius

Since the spectral radius is somewhat difficult to work with theoretically, we'll take an experimental approach instead.

Let $W_{n,v}$ be the random matrix of size $n$ consisting of (zero mean) gaussian entries with variance $v$. The spectral radius is a random variable of these entries. How does the distribution of the spectral radius look for, say, $W_{10,2}$? Sampling a set of $10,000$ matrices from this distibution gives the following result.

It looks somewhat Poisson distributed! We could stop here, conjecture that it indeed is, and then try to prove it, but let's move on.

Fixing the matrix size

Typically when training RNNs the number of hidden units is first decided upon, and then you go about mucking with the variance. Below I've fixed the matrix size to $10$. I then changed the variance between $0.1$ and $10$, and looked at the expected spectral radius (since all we really care about is that $E\{ \rho(W)\} \approx 1$).

It looks linear! That's nice. The coefficient here is about $3$. Thus, for the case of $n=10$, we know that if our variance is, say, $v=0.1$, then the spectral radius will be about $\rho(W_{10,0.1}) = 0.3$.

Varying the matrix size

What if we train the network, and then decide that we'd really like more hidden units? Can we be sure that the spectral radius will stay the same (assuming we don't change the variance)?

Above I'm varying the size of the matrix, while looking at the proportion between the expected spectral radius and variance in the entries. It's not constant!

In other words, be aware that when increasing the size of a matrix then its spectral radius will also increase.

Conclusion

The conclusion of this post is basically just that if you have something like this in your code,