$\begingroup$

I was reading Learning to learn by gradient descent by gradient descent and they outline how they are using an LSTM that is employed coordinate wise. So for each coordinate, they apply a Neural Network to output an update as follows:

$$ \theta^{<t>}_i, h^{<t>}_i, c^{<t>}_i = NN(

abla^{<t>}_i, h^{<t-1>}_i)$$

I was trying to understand the advantages and disadvantages of doing this. Related to this I realized it was closely related to how a fully connected layer interacts with batches of 1D data so I asked this: What is the intuition behind what Neural Networks do to data that is 1 dimensional?, where the batch size is equal to the number of parameters in this case. This is what I've thought:

Advantages

major advantage is that the NN is small because no weight inside it is of size # of the parameters. That is really good as it makes things feasible. coordinate invariance in terms of ordering (but ordering might matter).

Disadvantages

their major advantage seems to be their major disadvantage from a representational perspective because they are treating the coordinates as 1D data (i.e. an implicit indepedence assumption) when in fact they are not really independent (especially if they are processing image data, I'd assume of the convolutional filters detect edges etc so there is a translation symmetry/compositionality assumption). Of course to take advantage of this they'd need a CNN model for NN or something like that that aggregates all the coordinates but at least this way they don't have to worry about the size of the model size it will always output the right size. Invariance in ordering of params might not be as good as I said the params might be related and their distance between them might matter (but filters for CNNs are usually small e.g. 1x1, 3x3, 5x5 so I am not sure this might actually matter).

My gut just tells me there is some major disadvantage because treating them as 1D data seems odd and I am trying to articulate what the issue might be. Anyone know how to express it?