m bed huh…?

Much like everything in the data science world, embeddings are delightfully simple in a complicated fashion. The Keras documentation is a little sparse on the topic of embeddings simply eluding to the fact that they turn positive integers into fixed size dense vectors. What does that even mean and what does that tell us about the usefulness of embeddings?

Embeddings are the best-kept secret for neural networks with multiple varying inputs. Put simply the embedding layers themselves behave in a similar fashion to their cousins, the dense hidden layers. That is to say, that the dense vectors mentioned above are weighted similarly to that of the dense hidden layers, but they are not the result of an activation function. Instead, the embedding layers are randomly weighted in the forward pass (initially) and only upon back propagation they are re-computed. The idea behind this is that the model can group similar inputs based on the outcome of the forward pass through the model.

Keep in mind this is only an illustration, but if you observe, positive integers become index pointers in an array of x-dimensional weights

In the above graphic, we can see that inputs which are normalized to positive integer values which allow us to specifically extract weights for embeddings at that index. This is why we are required to specify the embedding size before the model is trained as the dimensionality is set to n-inputs of x-dimensional vectors. So to put it simply, we have a multidimensional vector representation for inputs into the model.

Now you may be thinking that’s, great we have some values from our inputs and all, but what does that mean? Well, in a nutshell, it means that neural networks are not the mystical “black boxes” that everyone claims them to be, well at least not entirely. Embeddings give us insight into the inner workings of our network and with a little help from dimensionality reduction techniques such as Principal Component Analysis (PCA) or t-distributed Stochastic Neighbour Embedding (t-SNE) we can have an inside look into how the network is grouping various inputs. Essentially we get a little peek into what our model is thinking and doing, an insiders advantage if you will.