This is an implementation that we have from the last article.

( defprotocol Parameters ( weights [ this ] ) ( bias [ this ] ) ) ( deftype FullyConnectedInference [ w b h activ-fn ] Releaseable ( release [ _ ] ( release w ) ( release b ) ( release h ) ) Parameters ( weights [ this ] w ) ( bias [ this ] b ) IFn ( invoke [ _ x ] ( activ-fn ( axpy! -1.0 b ( mv! w x h ) ) ) ) ( invoke [ _ x ones a ] ( activ-fn ( rk! -1.0 b ones ( mm! 1.0 w x 0.0 a ) ) ) ) )

nil#'user/fully-connected

The current structure supports both single vectors, and multiple vectors batched as columns in a matrix, as its input and output.

One thing bothers me, though: the memory that the network requires to operate. Weights and biases of each layer consume memory that is a fixed cost that we can not avoid. On the other hand, the output of each layer is relevant only during the propagation. When the signal passes to the next layer, that memory becomes irrelevant. The space it uses becomes used only when the inference is invoked with another input.

It can be argued that, when the network processes a single input, the wasted memory is not large, compared to the memory that we use for weights and bias. For example, if the input contains 200, and the output 1000 entries, the weight matrix needs 200 times more space than the output.

However, with batched input processing, the space used by the output matrix a becomes much more relevant. In the same example of the output size of 1000, and the batch size of 1000, the output matrix a now uses 5 times more than the weight matrix!

Let's say that the layer consists of 100,000 neurons, and we want to process 1000 inputs in a batch. The layer now uses 400 megabytes of memory for output alone. Having 10 such layers wastes 4 GB of memory, the total available on mid-range GPUs. One solution is to buy a more expensive GPU, but that does not take us far. With such relaxed approach, we soon exhaust the limits of top of the line consumer offerings, and will have to look at distributed solutions, which are much more expensive, and much slower.