In the previous article, we implemented sigmoid-prim! .

( defn sigmoid-prim! ( [ x! ] ( with-release [ x-raw ( raw x! ) ] ( sigmoid-prim! x! x-raw ) ) ) ( [ x! prim! ] ( let [ x ( sigmoid! x! ) ] ( mul! ( linear-frac! -1.0 x 1.0 prim! ) x ) ) ) )

Now, we just improve sigmoid! a bit.

( defn sigmoid! ( [] sigmoid-prim! ) ;; Only this, and that's it? Yes! ( [ x ] ( linear-frac! 0.5 ( tanh! ( scal! 0.5 x ) ) 0.5 ) ) ( [ x y ] ( linear-frac! 0.5 ( tanh! ( scal! 0.5 ( copy! x y ) ) ) 0.5 ) ) )

( def activation-prim ( sigmoid! ) )

nil#'user/activation-prim

Yep, works.

During the backward pass, we will call this function with z , and it will return the derivative. Then, we will multiply that derivative with whatever comes from the next layer (that was computed previously when going backwards) and get some matrix that we'll use when updating weights.

But, wait a minute. The one-argument sigmoid-prim! creates a new instance of the whole resulting matrix. I'm afraid that will slow us down a bit (or a lot) and will take some memory toll. Luckily, we have the two-argument variant, so we should provide the memory that is ready to accept the results.

Now, the key thing is that, after this calculation, I won't need \(z^l\) data any more. What I would really like to do is to just overwrite it with the value of the derivative! Should I just call (activation-prim z z) ? That is technically possible, but there lies a trap! Recall from the last article that in the particular case of the implementation of the sigmoid activation, , x! and prim! have to be different.

Can we reuse a ? Well, we have already reserved it for the signal coming from the next layer in the backward pass. Sorry! Hmm… We have a-1 sitting around idly. We won't need it until we use it to pass the signal to the previous layer. Nice, but its dimensions are different than a .