Language and images are closely intertwined: We think in pictures and we explain facts as spatial constellations. What if the spoken word could be transformed into dynamic visual worlds in real time? Speech input, machine learning and recurrent neural networks for image generation allow to com- puter generate complex imaginary worlds that follow the narrator and thus create complex animations controlled by linguistic structures.