With xtensor, we aim at developing a solution that works for the whole Data Science ecosystem. xtensor is an n-dimensional array container for C++, like NumPy for Python. We follow the NumPy API closely, but in pure C++ and strongly typed (there is even a cheat-sheet to show how closely the APIs are matched). Furthermore, we offer official bindings to Python, R, and Julia. This means, if you write your numerical algorithm once in C++ with xtensor, you can seamlessly use your routines from each of the big languages of Data Science. xtensor is flexible enough to use the dynamic languages for memory management: if you allocate a Julia array through xtensor-julia, Julia’s own memory allocator and garbage collector are used. xtensor also transparently handles any memory layout, such as Julia’s and R’s column-major and NumPy’s row-major layout. C++ has made great strides during the recent years to modernize itself (with C++ 14 and 17) that makes it an incredibly productive language.

In the remaining blog post we will look at the implementation of an algorithm for ray-shading height maps — which has come up as a challenge from the R community first, then was picked up to show how GraalVM can speed up R, which was contested by Julia and Pythran. The aim of this article is now to show how we could implement the algorithm once and for all in C++ with xtensor, and then create the bindings to each of those languages while retaining a maximum speed.

The challenge

We have ported over the original R & Julia (from Tyler Morgan-Wall, and Simon Danisch) first to Python, and then to xtensor (you can find the Python version here).

Our version lives in https://github.com/wolfv/xtensor-showcase

You might note that, apart from the use of curly brackets here and there, the length of the ported C++ code is basically the same as the original Python or R implementation. NumPy users may also recognize functions such as linspace . This is because we stick to the NumPy APIs for most high-level array operations. The big difference is, of course, providing the <double> template parameter which is the strong compile-time typing necessary for efficient computation and the generation of fast code with our template expression engine (a great explanation of how this works can be found here).

We can now create bindings for the big three: Python, R, and Julia. For each language, the respective xtensor package exists:

xtensor-python: deals seamlessly with NumPy arrays

xtensor-r: transfer back and forth R vectors and matrices

xtensor-julia: bind Julia’s n-dimensional arrays

Those three packages all work similarly. They use a language-specific C++ package (pybind11, CxxWrap.jl, and RCpp) to natively create the data structures in the host language. Using these packages we create new xtypes in C++: xt::pyarray<T> for a NumPy data backed python array, xt::rarray<T> for the R version and xt::jlarray<T>for Julia. Additionally, the packages also contain versions with a static number of dimensions (the equivalent to the xt::xtensor<T, N> where N is an integer indicating the dimensionality. This enables a bunch of optimizations for the C++ compiler and in the template expression engine.

Let’s look at the Python bindings, for example (Julia and R bindings are just as simple):

As input to the function, we take a xt::pyarray<float>. This type is registered with pybind11, and the type is converted automatically from a NumPy array — without copying the buffer content at any step! As the rayshade_impl function takes a template argument E as input, we can call it with any class that follows the xexpression interface. Of course, the pyarray does that (as well as the rarray and jlarray).

Benchmarks

Below are the benchmarks. As you can see, the benchmarks show how compiled languages like C++ and Julia can generate extremely effective code, and that the xtensor-bindings do not impact the speed at all! (By the way, the C++ result is a little slower than the Julia one, because no dedicated benchmarking package has been used and we’ve only timed one iteration on a laptop — the CPU usually needs some iterations to ramp up the speed).

Benchmarks ╔══════════════════╦════════════╗

║ Language ║ Time (s) ║

╠══════════════════╬════════════╣

║ C++ ║ .0164812 ║

║ Python / xtensor ║ .0220982 ║

║ Python / NumPy ║ 14.224207 ║

║ Julia / xtensor ║ .014221 ║

║ Julia ║ .014501 ║

║ R / xtensor ║ .01328 ║

║ R ║ 9.905 ║

╚══════════════════╩════════════╝

We need to make a honorable mention here: Brodie Gaslam showed impressively how to make R code performant by utilizing vectorization (the same principle can be used in NumPy, as well as xtensor). His code performed almost as fast as R with xtensor and clocked in at 0.058 s.

How to get started

The code for this example is uploaded on https://github.com/wolfv/xtensor-showcase. We’ve created a lot of documentation for xtensor as well as cookie-cutter project templates to help people get started with plugins for Python and Julia. You could also check out the various xtensor-repositories on GitHub under the QuantStack project. And we’re often online on the Gitter chat room or on Twitter.

Where do we want to go from here

There are many interesting and challenging tasks ahead for xtensor:

More of the NumPy API: We already cover a lot, but there are still some missing parts! We’ve even compiled a GitHub project with some easy-to-tackle missing bits of the NumPy API. If you want to get your feet wet again with C++, for example, we’re happy to mentor newcomers through the implementation of some missing functionality!

Accelerate some functions further: There are still some bottlenecks in the functions we have implemented so far. It’s time to get rid of all of them.

GPU support — an obvious one. Everybody has it and we need to get it, hopefully, next year. This can speed up operations on big data massively.

More interoperability: We want to have deep support with Apache Arrow (work in progress) and PyTorch, for example. We would also be happy with more language bindings. Matlab/Octave for example, or Java!

Compile NumPy to xtensor: This is a big one. With the help of Pythran, we could compile NumPy code to C++ (Pythran currently has its own NumPy implementation in C++). We already work closely with the author of Pythran and Pythran is using the same SIMD library as xtensor (called xsimd). The big vision here would be that one could write the numeric code once using NumPy for fast prototyping, and then automatically compile it to C++. Once in C++ we could easily generate the binding code to Python (to export to a standalone library) as well as R and Julia! This could make it potentially extremely nice to write cross-language libraries from a high-level algorithm description.

You can follow the author on Twitter or visit the Gitter chat room at https://gitter.im/QuantStack/Lobby to discuss xtensor and related projects!

Links: