tl; dr: You can hide/encapsulate the state of arbitrary recurrent networks with a single page of code

In an ideal world, every deep learning paper proposing a new architecture would link to a readily-accessible Github repository with implemented code.

In reality, you often have to hand-code the translated equations yourself, make a bunch of assumptions, and do a lot of debugging before you get something that may or may not be related to the authors’ intent.

This process is especially fraught when dealing with recurrent architectures (aka “recurrent neural networks”): computational graphs which are DGs (directed graphs) but not DAGs (directed acyclic graphs). Recurrent architectures are especially good at modeling/generating sequential data — language, music, video, even video games — anything where you care about the order of data rather than just pure input/output mapping.

However, because we can’t directly train directed graphs with directed cycles (whew!), we have to implement and train graphs that are transformations of the original graph (going from “cyclic” to “unrolled” versions) and then use Backpropagation through time (BPTT) on these shadow models. In essence, we’re mapping connections across time to connections across space: