$\begingroup$

Are there publications which mention numerical problems in neural network optimization?

(Blog posts, articles, workshop notes, lecture notes, books - anything?)

Background of the question

I've recently had a strange phenomenon: When I trained a convolutional network on the GTSRB dataset with a given script on my machine, it got state of the art results (99.9% test accuracy). 10 times. No outlier. When I used the same scripts on another machine, I got much worse results (~ 80% test accuracy or so, 10 times, no outliers). I thought that I probably didn't use the same scripts and as it was not important for my publication I just removed all results of that dataset. I thought I probably made a mistake one one of the machines (e.g. using different pre-processed data) and I couldn't find out where the mistake happened.

Now a friend wrote me that he has a network, a training script and a dataset which converges on machine A but does not converge on machine B. Exactly the same setup (a fully connected network trained as an autoencoder).

I have only one guess what might happen: The machines have different hardware. It might be possible that Tensorflow uses different algorithms for matrix multiplication / gradient calculation. Their numeric properties might be different. Those differences might cause one machine to be able to optimize the network while the other can't.

Of course, this needs further investigation. But no matter what is happening in those two cases, I think this question is interesting. Intuitively, I would say that numeric issues should not be important as sharp minima are not desired anyway and differences in one multiplication are less important than the update of the next step.