Implementing inference is simpler than implementing training. Following the separation between the inference and training types, we'll start with a stand-alone NeuralNetworkInference . Looking at the usage example, I can imagine it holding a sequence of layers, and implementing the invoke method of the IFn interface.

Since a network will typically contain several layers, it would be a good thing if it reused the instances of all throw-away objects, such as the vector of ones, or an additional shared matrix for inputs and outputs. Since we want to support arbitrary batch sizes, we would have to create and release these temporary objects on each invocation.

The following invoke implementations might seem too dense at first, but they are nothing more than the automation of the code we were writing by hand until now in the test examples when we were assembling the network and calling the inference by hand.

( deftype NeuralNetworkInference [ layers ^ long max-width-1 ^ long max-width-2 ] Releaseable ( release [ _ ] ( doseq [ l layers ] ( release l ) ) ) IFn ( invoke [ _ x ones-vctr temp-1! temp-2! ] ( let [ batch ( dim ones-vctr ) ] ( loop [ x x v1 temp-1! v2 temp-2! layers layers ] ( if layers ( recur ( let [ layer ( first layers ) ] ( layer x ones-vctr ( view-ge v1 ( mrows ( weights layer ) ) batch ) ) ) v2 v1 ( next layers ) ) x ) ) ) ) ( invoke [ this x a! ] ( let [ cnt ( count layers ) ] ( if ( = 0 cnt ) ( copy! x a! ) ( with-release [ ones-vctr ( entry! ( vctr x ( ncols x ) ) 1.0 ) ] ( if ( = 1 cnt ) ( ( layers 0 ) x ones-vctr a! ) ( with-release [ temp-1 ( vctr x ( * max-width-1 ( dim ones-vctr ) ) ) ] ( if ( = 2 cnt ) ( this x ones-vctr temp-1 a! ) ( with-release [ temp-2 ( vctr x ( * max-width-2 ( dim ones-vctr ) ) ) ] ( copy! ( this x ones-vctr temp-1 temp-2 ) a! ) ) ) ) ) ) ) ) ) ( invoke [ this x ] ( let-release [ a ( ge x ( mrows ( weights ( peek layers ) ) ) ( ncols x ) ) ] ( this x a ) ) ) )

The first invoke implementation is at the lowest level. It does not create any of the temporary work objects, and expects that the user provides the needed vector of ones, temp-1! , and temp-2 instances of sufficient capacity. This gives the user the opportunity to optimally manage the life-cycle of these structures. This function simply iterates through all layers, and evaluates them (recall that they are also functions) with appropriately alternated temp-1! and temp-2! .

The second invoke implementation goes one level above. It requires only input x and a matrix a! , which is going to be overwritten with the result of the evaluation. Then, depending on the number of layers, it calls the first variant of invoke in a most efficient way:

1) if the network does not have any layers, it simply copies the input.

2) if there is a single layer, it is called without initializing any temporary work memory.

3) if there are two layers, only one temporary object is needed.

4) for more than two layers, the two alternating work objects are used. The result of evaluation is copied to a! at the end.

Of course, all temporary work objects are released at the end of evaluation.

The third invoke is a pure function. It asks only for input, x , and returns the output in a new instance a . All temporary objects and mutations are encapsulated and invisible to the caller.

With these 3 variants, we have covered different trade-offs. We might pick the pure variant if we have enough resources and are concerned with code simplicity, but we can also opt for one of the destructive variants if we want, or if we have to be frugal with resources.

Since our network can automatically create temporary work objects, it needs to know their size. This is calculated during construction. The first temporary vector needs to be big enough to hold the largest output matrix in odd layers, while the second is charged with doing the same for even layers.