The idea seems to be quite simple.

The first image can be obtained by taking every frame of an animation, cutting it into vertical stripes, interleaving the stripes and sticthing them together.

The second image is essentialy a mask obained using the same approach, starting from an animation with the same number of frames of the original one, but made with a white frame, followed by black frames.

By interleaving I mean that once you have sliced the $N$ frames, you create a first block of stripes taking stripe $0$ from the first frame, then stripe $1$ from the second frame and so on so forth up to stripe $N-1$ from the last frame; then you start the second block of stripes by taking stripe $N$ from the first frame, stripe $N+1$ from the second frame and so on. Finally you stitch the blocks together.

The source notebook is available on GitHub (under GPL v3), feel free to use issues to point out errors, or to fork it to suggest edits.

A simple example¶

Let's try to make it more clear with an example. Consider an animation with three frames, containing the first capital letters of the alphabet suitably colored.