Date Mon 06 July 2015 Tags rust / python / ffi / geo

Introduction

For some of us at CASA (and, presumably, elsewhere), conversion from decimal longitude and latitude coordinates to BNG (British National Grid) coordinates is a daily occurrence. A few years ago, I was looking for a function to accomplish this in Python, and found this serendipitous post by my splendid colleague Dr Hannah Fry, which provided an implementation. This function has subsequently found its way into several projects.

Motivation

I do most of my programming and data analysis in Python, and the speed conferred by e.g. Numpy means that there’s rarely any need to implement more exotic solutions. Where projection conversions are concerned, there are many excellent libraries which wrap PROJ.4, which, after 20 years, is a mature, performant library. As a result, my knowledge of C (let alone C++) is so rudimentary as to be downright dangerous.

Ousterhout’s Dichotomy

And yet. Sometimes curiosity trumps caution. I’ve been keeping an eye on a new programming language called Rust for several years now, awaiting its stabilisation with a 1.0 release. Rust’s inventor, Graydon Hoare, has written at length about something he refers to as “Ousterhout’s Dichotomy”. Briefly, it states that

The cultures of batch and interactive computing are the ends of a spectrum, and most real computing systems embody a mixture of the two. Most batch systems require some degree of interactive reconfiguration; and most interactive systems have some very stable parts of their logic that benefit from going as fast as possible. [this gives rise to] Systems composed of exactly two languages, one “systems” language and one “script” language, with the former coding the inner loops and the latter coding the outer loops, and hooked up to the REPL to talk to the human operator.

That is, the tension between human- and machine-oriented computing is often not mediated by a single language, but two specialized languages with some (kinda gruesome) bridging device between the two that nobody much wants to think about, and everyone wishes would go away.

Rust

Rust is a systems programming language. I’m not aware of a consensus around the term, but Wikipedia does a reasonable job of sketching out some of the features, though it also makes some outlandish claims regarding debuggers. But I digress. Systems programming languages are concerned with speed and efficiency, as a result of which the programmer usually has to concern herself with the management of memory, and be familiar with the distinction between the heap and the stack, and pointers. As a more-or-less direct consequence of this, systems languages offer many more opportunities for errors which can produce catastrophic results due to leaking references, null or dangling pointers, and many other mistakes which occur as a direct result of their design.

Rust aims to change this, citing “safety, speed, and concurrency” as its primary goals.

In order to accomplish this, Rust implements something called “compile-time borrow-checking”, a mechanism composed of three related concepts:

Ownership: variables “own” the resources to which they are bound. When that binding goes out of scope, so does the resource

Borrowing: a variable may “lend” that bound resource to another variable. This resource may be lent many times immutably, or once mutably. This is unfamiliar territory to someone who’s used to programming in Python, or Ruby, or JS , and yet, once grasped, the system is both relatively straightforward to use, and very effective at preventing commonly-seen problems

once mutably. This is unfamiliar territory to someone who’s used to programming in Python, or Ruby, or , and yet, once grasped, the system is both relatively straightforward to use, and very effective at preventing commonly-seen problems Lifetimes: the concept of borrowing introduces additional complexity, if Rust’s primary goal (safety) is to be maintained: we have to ensure that resources aren’t freed while they’re still being borrowed. This involves some work on the part of the compiler, and some work on the part of the programmer.

In addition to these complex, interesting innovations, Rust-the-language provides many modern conveniences such as iterators, enums, and exhaustive case analysis — as well as operations from the functional programming world such as map() and fold() — which make it pleasant to use. In addition, its algebraic data types make it well-suited for the expression of l-systems, which is of particular interest to those undertaking spatial simulation and modelling.

Converting to BNG

Let’s have a look at the conversion algorithm, implemented in Rust 1.x (I’ve split some of the longer lines):

pub fn convert_bng ( longitude : & f32 , latitude : & f32 ) -> ( i32 , i32 ) { // input is restricted to the UK bounding box assert ! ( - 6.379880 <= * longitude && * longitude <= 1.768960 , "Out of bounds! Longitude must be between -6.379880 and 1.768960: {}" , longitude ); assert ! ( 49.871159 <= * latitude && * latitude <= 55.811741 , "Out of bounds! Latitude must be between 49.871159 and 55.811741: {}" , latitude ); // Convert input to degrees let lat_1 : f64 = * latitude as f64 * PI / 180. ; let lon_1 : f64 = * longitude as f64 * PI / 180. ; // The GRS80 semi-major and semi-minor axes used for WGS84 (m) let a_1 = GRS80_SEMI_MAJOR ; let b_1 = GRS80_SEMI_MINOR ; // The eccentricity (squared) of the GRS80 ellipsoid let e2_1 = 1. - ( b_1 . powi ( 2 )) / ( a_1 . powi ( 2 )); // Transverse radius of curvature let nu_1 = a_1 / ( 1. - e2_1 * lat_1 . sin (). powi ( 2 )). sqrt (); // Third spherical coordinate is 0, in this case let H : f64 = 0. ; let x_1 = ( nu_1 + H ) * lat_1 . cos () * lon_1 . cos (); let y_1 = ( nu_1 + H ) * lat_1 . cos () * lon_1 . sin (); let z_1 = (( 1. - e2_1 ) * nu_1 + H ) * lat_1 . sin (); // Perform Helmert transform (to go between Airy 1830 (_1) and GRS80 (_2)) let cst : f64 = 20.4894 ; let s = cst * ( 10 as f64 ). powi ( - 6 ); // The translations along x, y, z axes respectively let tx = TX ; let ty = TY ; let tz = TZ ; // The rotations along x, y, z respectively, in seconds let rxs = RXS ; let rys = RYS ; let rzs = RZS ; // In radians let rx = rxs * PI / ( 180. * 3600. ); let ry = rys * PI / ( 180. * 3600. ); let rz = rzs * PI / ( 180. * 3600. ); let x_2 = tx + ( 1. + s ) * x_1 + - rz * y_1 + ry * z_1 ; let y_2 = ty + rz * x_1 + ( 1. + s ) * y_1 + - rx * z_1 ; let z_2 = tz + - ry * x_1 + rx * y_1 + ( 1. + s ) * z_1 ; // The Airy 1830 semi-major and semi-minor axes used for OSGB36 (m) let a = AIRY_1830_SEMI_MAJOR ; let b = AIRY_1830_SEMI_MINOR ; // The eccentricity of the Airy 1830 ellipsoid let e2 = 1. - b . powi ( 2 ) / a . powi ( 2 ); let p = ( x_2 . powi ( 2 ) + y_2 . powi ( 2 )). sqrt (); // Initial value let mut lat = z_2 . atan2 (( p * ( 1. - e2 ))); let mut latold = 2. * PI ; // this is cheating, but not sure how else to initialise nu let mut nu : f64 = 1. ; // Latitude is obtained by iterative procedure while ( lat - latold ). abs () > ( 10 as f64 ). powi ( - 16 ) { mem :: swap ( & mut lat , & mut latold ); nu = a / ( 1. - e2 * latold . sin (). powi ( 2 )). sqrt (); lat = ( z_2 + e2 * nu * latold . sin ()). atan2 ( p ); } let lon = y_2 . atan2 ( x_2 ); // Scale factor on the central meridian let F0 : f64 = 0.9996012717 ; // Latitude of true origin (radians) let lat0 = 49. * PI / 180. ; // Longitude of true origin and central meridian (radians) let lon0 = - 2. * PI / 180. ; // Northing & easting of true origin (m) let N0 = TRUE_ORIGIN_NORTHING ; let E0 = TRUE_ORIGIN_EASTING ; let n = ( a - b ) / ( a + b ); // Meridional radius of curvature let rho = curvature ( a , F0 , e2 , lat ); let eta2 = nu * F0 / rho - 1. ; let M1 = ( 1. + n + ( 5. / 4. ) * n . powi ( 2 ) + ( 5. / 4. ) * n . powi ( 3 )) * ( lat - lat0 ); let M2 = ( 3. * n + 3. * n . powi ( 2 ) + ( 21. / 8. ) * n . powi ( 3 )) * ( lat - lat0 ). sin () * ( lat + lat0 ). cos (); let M3 = (( 15. / 8. ) * n . powi ( 2 ) + ( 15. / 8. ) * n . powi ( 3 )) * ( 2. * ( lat - lat0 )). sin () * ( 2. * ( lat + lat0 )). cos (); let M4 = ( 35. / 24. ) * n . powi ( 3 ) * ( 3. * ( lat - lat0 )). sin () * ( 3. * ( lat + lat0 )). cos (); let M = b * F0 * ( M1 - M2 + M3 - M4 ); let I = M + N0 ; let II = nu * F0 * lat . sin () * lat . cos () / 2. ; let III = nu * F0 * lat . sin () * lat . cos (). powi ( 3 ) * ( 5. - lat . tan (). powi ( 2 ) + 9. * eta2 ) / 24. ; let IIIA = nu * F0 * lat . sin () * lat . cos (). powi ( 5 ) * ( 61. - 58. * lat . tan (). powi ( 2 ) + lat . tan (). powi ( 4 )) / 720. ; let IV = nu * F0 * lat . cos (); let V = nu * F0 * lat . cos (). powi ( 3 ) * ( nu / rho - lat . tan (). powi ( 2 )) / 6. ; let VI = nu * F0 * lat . cos (). powi ( 5 ) * ( 5. - 18. * lat . tan (). powi ( 2 ) + lat . tan (). powi ( 4 ) + 14. * eta2 - 58. * eta2 * lat . tan (). powi ( 2 )) / 120. ; let N = I + II * ( lon - lon0 ). powi ( 2 ) + III * ( lon - lon0 ). powi ( 4 ) + IIIA * ( lon - lon0 ). powi ( 6 ); let E = E0 + IV * ( lon - lon0 ) + V * ( lon - lon0 ). powi ( 3 ) + VI * ( lon - lon0 ). powi ( 5 ); ( E . round () as i32 , N . round () as i32 ) }

As can be seen from the function signature, it accepts two borrowed 32-bit floating-point numbers (longitude and latitude), and returns a tuple of two 32-bit integers (the rounded BNG Eastings and Northings). Broadly speaking, this looks like any function in any other (modern) programming language; no strange notation or symbols, function names hint at functionality (e.g. powf() raises a floating-point number to a floating-point power).

Only three things stand out: the borrowed function inputs, the distinction between let and let mut , and the type annotations. The latter isn’t always necessary (the compiler can often infer the correct type), but in this case, I’ve been explicit.

So far, so good. We have a working conversion function. But how much faster is it? The Python version runs in about 6.89 µs on my machine. The rust version runs about 15x faster, in 0.45 µs. These numbers are so small (6 millionths of a second) that an order-of-magnitude improvement is essentially meaningless. Let’s use a somewhat more realistic test: converting a million pairs of lon, lat points to BNG Eastings and Northings.

import numpy as np # UK bounding box N = 55.811741 E = 1.768960 S = 49.871159 W = - 6.379880 lon_ls = list ( np . random . uniform ( W , E , [ 1000000 ])) lat_ls = list ( np . random . uniform ( N , S , [ 1000000 ])) % timeit [ bng ( lat , lon ) for lat , lon in zip ( lat_ls , lon_ls )] 1 loops , best of 3 : 8.03 s per loop

This is pretty fast in wall-clock terms, which is to be expected, considering that Python’s core mathematical functions are mostly C-based anyway, but I think rust can do better.

extern crate lonlat_bng ; use lonlat_bng :: { convert_to_bng_threaded , convert_vec_c , Array }; extern crate rand ; use rand :: distributions :: { IndependentSample , Range }; extern crate libc ; fn main () { let num_coords = 1000000 ; let between_lon = Range :: new ( - 6.379880 , 1.768960 ); let between_lat = Range :: new ( 49.871159 , 55.811741 ); let mut rng = rand :: thread_rng (); let lon_vec = vec ! [ between_lon . ind_sample ( & mut rng ) as f32 ; num_coords ]; let lat_vec = vec ! [ between_lat . ind_sample ( & mut rng ) as f32 ; num_coords ]; let lon_arr = Array { data : lon_vec . as_ptr () as * const libc :: c_void , len : lon_vec . len () as libc :: size_t , }; let lat_arr = Array { data : lat_vec . as_ptr () as * const libc :: c_void , len : lat_vec . len () as libc :: size_t , }; convert_vec_c ( lon_arr , lat_arr ); }

Running the above using time runs in 1.18 seconds, or 6.8 times faster. Pretty good, but not quite an order of magnitude. But wait, isn’t there a trade-off? If I use the Python version, I have the convenience of working in a Jupyter (formerly IPython) notebook, with separately-executable cells, interactive visualisation tools, and built-in debugging. Cast your mind back to Graydon’s discussion of Ousterhout’s Dichotomy. Recall him mentioning “some (kinda gruesome) bridging device between the two that nobody much wants to think about, and everyone wishes would go away”. That bridging device is FFI.

FFI

A foreign function interface (FFI) is “a mechanism by which a program written in one programming language can call routines or make use of services written in another”.

The mechanism, in this case, is a library called ctypes. Let’s define a link to our rust library:

class _BNG_FFITuple ( Structure ): _fields_ = [( "a" , c_uint32 ), ( "b" , c_uint32 )] class _BNG_FFIArray ( Structure ): _fields_ = [( "data" , c_void_p ), ( "len" , c_size_t )] # Allow implicit conversions from a sequence of 32-bit unsigned # integers. @classmethod def from_param ( cls , seq ): return seq if isinstance ( seq , cls ) else cls ( seq ) # Wrap sequence of values. You can specify another type besides a # 32-bit unsigned integer. def __init__ ( self , seq , data_type = c_float ): array_type = data_type * len ( seq ) raw_seq = array_type ( * seq ) self . data = cast ( raw_seq , c_void_p ) self . len = len ( seq ) # A conversion function that cleans up the result value to make it # nicer to consume. def _bng_void_array_to_tuple_list ( array , _func , _args ): res = cast ( array . data , POINTER ( _BNG_FFITuple * array . len ))[ 0 ] res_list = [( i . a , i . b ) for i in iter ( res )] drop_bng_array ( array ) return res_list class _LONLAT_FFITuple ( Structure ): _fields_ = [( "a" , c_float ), ( "b" , c_float )] class _LONLAT_FFIArray ( Structure ): _fields_ = [( "data" , c_void_p ), ( "len" , c_size_t )] # Allow implicit conversions from a sequence of 32-bit unsigned # integers. @classmethod def from_param ( cls , seq ): return seq if isinstance ( seq , cls ) else cls ( seq ) # Wrap sequence of values. You can specify another type besides a # 32-bit unsigned integer. def __init__ ( self , seq , data_type = c_uint32 ): array_type = data_type * len ( seq ) raw_seq = array_type ( * seq ) self . data = cast ( raw_seq , c_void_p ) self . len = len ( seq ) # A conversion function that cleans up the result value to make it # nicer to consume. def _lonlat_void_array_to_tuple_list ( array , _func , _args ): res = cast ( array . data , POINTER ( _LONLAT_FFITuple * array . len ))[ 0 ] res_list = [( i . a , i . b ) for i in iter ( res )] drop_ll_array ( array ) return res_list

This code requires a little explanation: because ctypes was originally written to interface with C, it expects and provides C-like data structures. Note the definition of FFITuple and FFIArray . In an ideal world, the module would be aware of rust’s vector and tuple types, but alas, it isn’t. However, all is not lost, since it’s relatively straightforward to implement the corresponding types in our rust library in a safe manner:

#[repr(C)] pub struct IntTuple { a : uint32_t , b : uint32_t , } #[repr(C)] pub struct FloatTuple { a : c_float , b : c_float , } #[repr(C)] pub struct Array { data : * const c_void , len : libc :: size_t , }

Next, we provide some mechanisms for the conversion between these primitive structures and rust’s richer structures:

impl Array { unsafe fn as_f32_slice ( & self ) -> & [ f32 ] { assert ! ( ! self . data . is_null ()); slice :: from_raw_parts ( self . data as * const f32 , self . len as usize ) } unsafe fn as_i32_slice ( & self ) -> & [ i32 ] { assert ! ( ! self . data . is_null ()); slice :: from_raw_parts ( self . data as * const i32 , self . len as usize ) } fn from_vec < T > ( mut vec : Vec < T > ) -> Array { // Important to make length and capacity match // A better solution is to track both length and capacity vec . shrink_to_fit (); let array = Array { data : vec . as_ptr () as * const libc :: c_void , len : vec . len () as libc :: size_t , }; // Whee! Leak the memory, and now the raw pointer (and // eventually Python) is the owner. mem :: forget ( vec ); array } }

Let’s define our wrapper and interface function:

#[no_mangle] pub extern "C" fn convert_vec_c ( longitudes : Array , latitudes : Array ) -> Array { // we're receiving floats let lon = unsafe { longitudes . as_f32_slice () }; let lat = unsafe { latitudes . as_f32_slice () }; // borrow and combine lons and lats let orig = lon . iter () . zip ( lat . iter ()); // carry out the conversion let result = orig . map ( | elem | convert_bng ( elem . 0 , elem . 1 )); // convert back to vector of unsigned integer Tuples let nvec = result . map ( | ints | { IntTuple { a : ints . 0 as u32 , b : ints . 1 as u32 , } }) . collect (); Array :: from_vec ( nvec ) }

Note the no-mangle and extern directives. These ensure that the function can be called using the name defined in the source code, and that it is to be externally accessible.

The function:

Accepts two Array s of longitude and latitude values across the FFI boundary

s of longitude and latitude values across the boundary Converts them to slice s

s Combines the slice s into a vector of lon, lat tuples

s into a of lon, lat tuples Maps our previously-defined convert() function over the vector , returning a vector of integer tuples

function over the , returning a of integer tuples converts that vector back to an Array of Tuple s

of s returns the Array back across the FFI boundary.

But how fast does it run?

%%timeit convertbng ( lon_ls , lat_ls ) 1 loops , best of 3 : 2.55 s per loop

A 3.15x speedup. Pretty good. Bear in mind that ctypes adds some overhead. Can we do better, though? Perhaps.

Threading

It turns out that a side-effect (entirely intentional, according to Graydon) of rust’s safety guarantees is that concurrent and parallel (see this answer for a succint distinction between the two concepts) programs can be written without a great deal of the uncertainty and morbid fear which usually attends such endeavours. The benefits of concurrency and parallelism are beyond the scope of this already-exhausting exposition, and their elucidation is left as an exercise for the reader. Suffice to say that safe, easy, concurrent programming is of great benefit to us. But enough exposition:

#[no_mangle] pub extern "C" fn convert_to_bng_threaded ( longitudes : Array , latitudes : Array ) -> Array { let numthreads = num_cpus :: get () as usize ; let orig : Vec < ( & f32 , & f32 ) > = unsafe { longitudes . as_f32_slice () } . iter () . zip ( unsafe { latitudes . as_f32_slice () }. iter ()) . collect (); let mut result = vec ! [( 1 , 1 ); orig . len ()]; let mut size = orig . len () / numthreads ; if orig . len () % numthreads > 0 { size += 1 ; } size = std :: cmp :: max ( 1 , size ); crossbeam :: scope ( | scope | { for ( res_chunk , orig_chunk ) in result . chunks_mut ( size ). zip ( orig . chunks ( size )) { scope . spawn ( move || { for ( res_elem , orig_elem ) in res_chunk . iter_mut (). zip ( orig_chunk . iter ()) { * res_elem = convert_bng ( orig_elem . 0 , orig_elem . 1 ); } }); } }); Array :: from_vec ( result ) }

This function is extremely similar to convert_vec_c , but note what happens after we’ve created orig :

We define a mutable result vector, which will hold the results of our threaded computations, and fill it with arbitrary numbers

determine the size of the chunks into which orig will be split, according to the number of threads we’d like to spawn

will be split, according to the number of threads we’d like to spawn define a scope, which accepts a closure

split orig and result into immutable and mutable chunks, respectively, and for each pair of these chunks: spawn a thread (inside the closure we passed to scope), and in it: immutably iterate over each orig chunk, and mutably over each result chunk, and: call our conversion function using the orig values assign the output to each element of the result chunk.

and into and chunks, respectively, and for each pair of these chunks:

The crossbeam crate takes care of the parallel processing here, so we don’t need to worry about things like Join Guards, and combining the result at the end. For the most part, it just works – you can read about the specifics here. In short, we can confidently spawn threads and be guaranteed that they terminate before the scope exits by having the parent thread join on the child thread.

How much faster?

On a 2 core, 4-processor MacBook Air, the threaded function converts one million points in 483ms, or around 18.3x faster than Python. Let’s hook it up using FFI.

Our threaded function now runs in around 1.79 s. A 1.4x speedup over the single-threaded version, and a 4.48x speedup over pure Python. The more cores you have, the more speed you’ll gain, up to a point (this is a vast over-simplification, but detailed discussions of thread performance and optimisation are beyond the scope of this article).

That’s pretty good, but consider:

Rust is new. Considerable scope for improvement remains w/r/t optimisations

I have no idea what I’m doing, really

The threading mechanism is itself quite new, and may yet evolve to become more efficient

There are other concurrency strategies such as channels and locks, which may offer a more efficient approach.

Reversing the Operation

Converting from longitude and latitude is useful, but the bare miniumum for a a small library like this is a reversible operation. Again, Ordnance Survey have provided detailed guidance on implementing this transformation, and I’ve implemented it as lonlat_bng::convert_lonlat() , along with a threaded version which is identical to lonlat_bng::convert_bng , except for the types.

Cleaning Up

The main subtlety when using FFI like this is that you have to “leak” memory that the rust process owns, using a raw pointer. This is an inherently unsafe operation, and if you don’t diligently clean up after yourself, you’ll have holes in your program. Luckily, like most things in rust, it isn’t too onerous:

First, we have to define a function that accepts our Array struct back across the FFI boundary from Python. Recall that we converted from a rust vector to this Array struct, then leaked it, in the from_vec() method.

boundary from Python. Recall that we converted from a rust vector to this Array struct, then leaked it, in the method. When we receive the pointer, check that it isn’t null (and just return nothing if it is)

Convert it from a pointer to a vector, which we immediately drop. Rust actually provides a method for this: Vec::from_raw_parts() . It works because our Array struct contains everything we need: the data itself, the length, and the capacity (which should always match)

. It works because our Array struct contains everything we need: the data itself, the length, and the capacity (which should always match) Once we’ve dropped it, rust frees the memory it had allocated, and we’re finished.

We refer to our cleanup function from Python by calling it after we assign the result of our rust program (which is an Array) to a list. Have a look at the drop_bng_array() line in _bng_void_array_to_tuple_list()

Etc

The complete working source, as well as an example notebook, and compilation instructions are available on Github

I’ve created a library, convertbng , which includes pip-installable OS X and Linux binaries. It’s available from PyPI

, which includes pip-installable X and Linux binaries. It’s available from PyPI A comprehensive suite of examples using FFI to interact with rust can be found here.