I recently started my first real project in Tensorflow. The larger effort, Attalos, was aimed at exploring vector spaces containing multiple types (modalities) of data. This post is about work that I did looking into how to project image features into a word vector space. The goal of the project is to be able to do image search more effectively.

This article isn’t about getting started with Tensorflow and it’s not directly about the projection that I was attempting to do. Instead, this article is a walk down the path I took in implementing a regression with neural networks in Tensorflow. Specifically, it’s about finding some of the higher-level abstractions I loved in Keras in tensorflow.contrib (and yes, I know Keras can run on Tensorflow).

The Task:

I mentioned above that I was trying to project image features into a word vector space. More specifically I had extracted 2048 dimensional vectors of image features from an Inception v3 model. The task I was completing was to regress those features into a 300 dimensional word vector space based on GloVe.

I ended up doing the same regression three different ways (at different abstraction levels):

Using only traditional Tensorflow operators Using the “layers” library in Tensorflow Using the “learn” library in Tensorflow

Try 1: Naïve Approach (aka raw Tensorflow)

The first approach had its origins in Theano (another deep learning library Lab41 has previously used). In Theano (and Tensorflow) the user is responsible for everything. You define a graph of computation that is merely a thin layer on top of a matrix math library.

In order to represent our regression in Tensorflow (or Theano) it’s best to first mathematically represent our calculation:

For the loss function (what we are trying to optimize) I used mean squared error:

In raw Tensorflow the graph setup looks like the code snippet below. We start by defining our variables (input image features, target word vectors, and the weights). We then create our computation graph (in this case a two layer perceptron). Finally we setup our optimization by defining a loss function and then specifying an optimizer for that loss. All together that creates:

Try 2: A slightly “better” approach

The above approach worked but I was left thinking that I surely can’t be the only person who was interested in a dense, fully-connected neural network and it seemed silly (and error prone) to start from scratch. (Maybe I was just bitter because I forgot to initialize my network properly and learned that a vector of all zeros is actually a local minimum for my problem).

After a little reading I learned about tensorflow.contrib. This is a place where new ideas can get tested before they get integrated into the core library. Try two borrows from the “layers” portion of contrib, which gives us a Keras like abstraction for layers.

Our network from above now looks like this:

As a non-deep learning expert I can look at this and reason about what’s happening (vs. the code from the 1st approach where I find myself counting sigmoids to figure out the depth of the network).

Try 3: But wait! There’s more.

The above approach gives you almost all of the flexibility from the 1st approach while avoiding some of the pitfalls and ending up with something a little easier to approach. Try 3 borrows from another part of the contrib section, “learn” which is based on the SkFlow work.

Specifically I can define the architecture for a neural network based regression:

For inference I just call:

If you’ve used scikit-learn before this interface looks familiar. You define your model, call a fit function, and then a predict function after that. Having a scikit-learn like interface to deep learning primitives makes deep learning much more accessible but at the expense of flexibility. This also means that it is easy to replace calls to scikit-learn with calls to “learn” functions.

Lessons Learned:

Going through this process there were a few things that struck me:

Use higher-level abstractions till you can’t. If you look at most raw Tensorflow deep learning code you find that people define functions to abstract away the mundane details of creating each layer. If you use pre-created ones you reduce duplication and reduce the chances of error. Don’t be afraid to dive in. Look at what that layer looks like. Figure out what happens when you call model.fit(). At a minimum you need to understand the defaults for the library you are using (e.g. that the TensorFlowDNNRegressor uses ReLu by default). Just because you’re using a higher-level of abstraction doesn’t mean you get a pass on understanding the network (it might be even more important since most abstraction levels make some assumptions) Think of others. Most of us aren’t an army of one. Other people have to be able to read the code and understand the network you’ve created. People should be spending their time thinking about why you made the choices you did (instead of trying to understand what it is you made).

Which is best?

The three approaches represent different abstraction levels and which is best really depends on your interest and what you are trying to do. In general the “layers” library provides me with the abstraction level that most matches the way I think about neural networks. It gives me the flexibility to connect things way I want without too much boilerplate code. The “learn” library is great if what you’re interested in is DNN classification/regression or if you were already using scikit-learn and wanted to maintain that interface.