Currently our N-character RNN looks at N characters and predicts the N+1th character. When we ask it to predict the N+2th character, it throws away the hidden state from the last prediction and starts again. Our RNN acts as a stateless "pure function": its output is fully determined by its input, independent of what it has seen before. This stateless behaviour limits how far back our model can remember to N characters. We want to be able to remember further back than that!

Instead of throwing away our history, we can preserve and re-use it inside the RNN, which will hopefully lead to a better language model.

We're going to have to make a couple of changes to our code to accomodate this new stateful RNN.

Preserve Hidden State¶

The RNN will need to preserve its hidden state activations between each training batch, so that it can keep track of what it has seen so far.

Custom Data Loader¶

Previously each prediction made by our model was independent, so the order of training data didn't matter, but now it does because of the hidden state, which is preserved between batches. Each successive training batch will need to contain the successor text of the last batch. Our input data currently comes out in this format:

BATCH ELEMENT 0 ELEMENT 1 0 'preface su' 'pposing that' 1 'truth is a' ' woman--what' 2 'then? is the' 're not groun'

As you can see, element 1 of batch 0 follows element 0 in the same batch. Instead of each batch containing a contiguous sequence of text, we need each element to contain a contiguous sequence, so that our hidden state can be used effectively in training:

BATCH ELEMENT 0 ELEMENT 1 0 'preface su' ' woman--what' 1 'pposing that' 'then? is the' 2 'truth is a' 're not groun'

We need to write our own data sampler to ensure that our training data comes out in this element-wise sequential order.