$\begingroup$

I was recently thinking about the memory cost of (a) training a CNN and (b) inference with a CNN. Please note, that I am not talking about the storage (which is simply the number of parameters).

How much memory does a given CNN (e.g. VGG-16 D) need for

(a) Training (with ADAM)

(b) Inference on a single image?

My thoughts

Basically, I want to make sure that I didn't forget anything with this question. If you have other sources which explain this kind of thought, please share them with me.

(a) Training

For training with ADAM, I will now assume that I have a Mini-batch size of $B \in \mathbb{N}$ and $w \in \mathbb{N}$ is the number of parameters of the CNN. Then the memory footprint (the maximum amount of memory I need at any point while training) for a single training pass is:

$2w$ : Keep the weights and the weight updates in memory

: Keep the weights and the weight updates in memory $B \cdot $ Size of all generated feature maps (forward pass)

Size of all generated feature maps (forward pass) $w$ : Gradients for each weight (backpropagation)

: Gradients for each weight (backpropagation) $w$ : Learning rates for each weight (ADAM)

(b) Inference

In inference, it is not necessary to store a feature map of layer $i-1$ if the feature maps of layer $i$ are already calculated. So the memory footprint while inference is: