Here is a simple HowTo to understand the concept of shapes in TensorFlow and hopefully avoid losing hours of debugging them.

What is a tensor?

Very briefly, a tensor is an N-dimensional array containing the same type of data (int32, bool, etc.): All you need to describe a tensor fully is its data type and the value of each of the N dimension.

That’s why we describe a tensor with what we call a shape: it is a list, tuple or TensorShape of numbers containing the size of each dimension of our tensor, for example:

For a tensor of n dimensions: (D0, D1, …, Dn-1)

For a tensor of size W x H (usually called a matrix): (W, H)

(usually called a matrix): For a tensor of size W (usually called a vector): (W,)

(usually called a vector): For a simple scalar (those are equivalent): () or (1,)

Note: (D*, W and H are integers)

Note on the vector (1-D tensor): it is impossible to determine if a vector is a row or column vector by looking at the vector shape in TensorFlow, and in fact, it doesn’t matter. For more information please look at this stack overflow answer about NumPy notation ( which is roughly the same as TensorFlow notation): http://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r

A tensor looks like this in TensorFlow:

We can see we have a Tensor object:

It has a name used in a key-value store to retrieve it later: Const:0

used in a key-value store to retrieve it later: It has a shape describing the size of each dimension: (6, 3, 7)

describing the size of each dimension: It has a type: float32

That’s it!

Now, here is the most important piece of this article: Tensors in TensorFlow have 2 shapes: The static shape AND the dynamic shape!

Tensor in TensorFlow has 2 shapes! The static shape AND the dynamic shape

The static shape

The static shape is the shape you provided when creating a tensor OR the shape inferred by TensorFlow when you define an operation resulting in a new tensor. It is a tuple or a list.

TensorFlow will do its best to guess the shape of your different tensors (between your different operations) but it won’t always be able to do it. Especially if you start to do operations with placeholder defined with unknown dimensions (like when you want to use a dynamic batch size).

The static shape is a tuple or a list.

To use the static shape (Accessing/changing) in your code, you will use the different functions which are attached to the Tensor itself and have an underscore in their names:

Note: The static shape is very useful to debug your code with print so you can check your tensors have the right shapes.

The dynamic shape

The dynamic shape is the actual one used when you run your graph. It is itself a tensor describing the shape of the original tensor.

If you defined a placeholder with undefined dimensions (with the None type as a dimension), those None dimensions will only have a real value when you feed an input to your placeholder and so forth, any variable depending on this placeholder.

The dynamic shape is itself a tensor describing the shape of the original tensor

To use the dynamic shape(Accessing/changing) in your code, you will use the different functions which are attached to the main scope and don’t have an underscore in their names:

The dynamic shape is very handy for dealing with dimensions that you want to keep dynamic.

A real use case: the RNN

We like dynamic inputs because we want to build a dynamic RNN which should be able to handle any different length of inputs.

In the training phase we will define a placeholder with a dynamic batch_size, and then we will use the TensorFlow API to create an LSTM. You will end up with something like this:

And now you need to initialize the init_state with init_state = cell.zero_state(batch_size, tf.float32) ...

But what the “batch_size” input should be equal to? Remember, you want it to be dynamic so what are our options? TensorFlow allows different types here, if you read the source code you will find:

Args:

batch_size: int, float, or unit Tensor representing the batch size.

int and float can’t be used because when you define your graph, you actually don’t know what the batch_size will be (that’s the point).

The interesting piece is the last type: “unit Tensor representing the batch size”. If you dig the doc up from there, you will find that a unit Tensor is a “0-d Tensor” which is just a scalar. So how do you get that scalar-tensor anyway?

If you try with the static shape:

batch_size will be the Dimension(None) type (printed as ‘?’). This type can only be used as a dimension for placeholders.

What you actually want to do is to keep the dynamic batch_size “flow” through the graph, so you must use the dynamic shape:

batch_size will be a TensorFlow 0-d Tensor (Scalar Tensor) type describing the batch dimension, hooray!

Conclusion

Use the static shape for debugging

Use the dynamic shape everywhere else especially when you have undefined dimensions

Remark 1

In the RNN API, TF is taking care of the init_state and initialise it to the zero_state, why would I need to manually define it this way?

You might want to control the initialisation of the init_state when you run your graph. By having access to the variable init_state this way, we can do it because when you run a graph, you can actually use the feed_dict to feed any variable at hand in your graph!

And now you can predict words Ad vitam æternam, Cheers! 🍺

References

https://www.tensorflow.org/programmers_guide/faq#tensor_shapes