In machine learning, an “Epoch” is the length of time required for the machine to examine all of your input data and learn from it. More epochs = more learning. But also unfortunately more data = slower epochs. Now, at the time of the last post one epoch was some 25,000 Mel images. With this much data, my poor little laptop was struggling to do 50 epochs. And yet clearly, after even 100 epochs (as evidenced in the last post) the base images were not acceptable in any way. Even if they were, the resolution would be too small. So the time came to invest, so I went and bought a chunky new desktop, complete with fancy graphics cards (a must in serious machine learning) to give me a speed boost. My goal? 2,000 Epochs or bust.

Now that sounds great, but I then discovered that getting the correct graphics drivers setup was like completing the trials of Hercules – and I’m a paid IT professional. There were 3 weekends of arduous trial and error until finally it was all done and setup. But it was worth it, because when I ran my first test, that one with the images of Jerry Garcia, instead of taking 9 hours it took 5 minutes! A staggering 100x faster. Now I can really forge ahead, I thought! So, how does 2,000 epochs look? Like this:

Being Better Just Brings Bigger Problems

It was here that real problems began. The first thing I noticed was trivial but important: due to the way that my data was structured and loaded, half the machines memory was being wasted. This causes major slowdowns as data has to be read from disc. The other problem was more important though: quite often, my GANS would stop learning after a small number of generations.

It seems that this is because the discriminator was getting too good – it was learning so fast that the creator could not keep up. This process was random as well, so it took a load of runs to get to 2,000 epochs. In a way, this is a good result, because it is a common problem of the technique I’m using; this likely indicates I’m partially on the wrong track. All said and done though, I thought the final result wasn’t bad this early in the experiments.

Beyond the problems of low resolution, discriminators learning quickly and managing all the data in the local machine there is a much larger problem: I have no method for turning the spectrograms back into audio. Being as that is the ultimate showstopper when the aim is to produce audio, this is the next issue we will solve. Stay tuned for updates!