Current and emerging deep learning architectures call for an expressive high-level programming style with end-to-end differentiation and for a high-performance implementation at the same time. But the current generation of deep learning frameworks either limits expressiveness and ease of use for increased performance (e.g., TensorFlow) or vice versa (e.g., PyTorch). In this paper we demonstrate that a “best of both worlds” approach is possible, based on multi-stage programming and delimited continuations, two orthogonal ideas firmly rooted in programming languages research.