Calling Python from Elixir: ErlPort vs Thrift

3,173 reads

Learning and using Elixir has been one of my most enjoyable programming experiences, it has a great community and many awesome tools. The only issue I have come across till now is that some libraries in the community do not exist or are not fully developed yet.

I ran into this problem recently when I needed to process some large GeoJSON files that defined lat/long boundaries for blocks and municipal zones. The data was represented as Polygons and I needed to figure out which blocks came into commercial zones. This required detecting a intersection between two Polygons, a difficult problem to solve.

city blocks in red, commercial zones layered on top in green

I could not find any library to help do this in Elixir, but there was a Python library, Shapely, that would do exactly what I needed. I found two options for integrating this into my project, ErlPort and Apache Thrift.

To show how I used these two I’ll be going through a simple example that works on two very small GeoJSON files, one defining blocks and one defining zones. I’ll assume non-commercial zones are already filtered out, we will just be trying to find the blocks that intersect with any of the zones. The files have this format:

For these examples I created a new mix project, PythonCalls. The code and GeoJSON files are available here: https://github.com/chiragtoor/python_calls

Using ErlPort

To start with we can pull out the block and zone info using Poison, keep in mind our example files are very small so we can read them entirely into memory:

Using these private helper functions we get the lat/long coordinates of the blocks and zones with pattern matching:

Now we need some Python code that will take in these coordinates and use Shapely’s Polygons to detect an intersection:

Connecting these two pieces via ErlPort is very simple, we need to add it to our deps:

{:erlport, git: “https://github.com/hdima/erlport.git"}

Now just start up a Python instance and use it to make calls to our Python function, since our blocks and zones are only made up of lists and floats the data we pass to Python is automatically converted to Python equivalents by ErlPort:

With this our Elixir and Python pieces are connected and we can easily tell which blocks come into commercial zones using Shapely.

Using Apache Thrift

Thanks to Pinterest’s Riffed library, using Thrift is very easy. Here I’ll just be showing a quick setup, for more details on Thrift refer to the project webpage.

With Thrift we will be setting up a separate service in Python that will be called through a Elixir client. The getting started guide in the Riffed Github repo covers how to setup Riffed very well, so assuming that is done we need to define a Thrift interface file. This defines the service we will build and the data structures that will be used:

We need to be able to express our blocks and zones in some way, so we have a Point struct for each pair of lat/long coordinates and a Boundary struct for all the Points of a block/zone. The service only needs one function that will take in a block and zone and tell us if they intersect. We need to run Thrift in order to build our Python service on top of this interface.

thrift -gen py -out python/ ./thrift/geoservice.thrift

This takes our defined interface and outputs several files to a directory called ‘python’ at the root of our mix project. We will use these generated files to implement our service:

Now we need to setup a Elixir client for our Python service using Riffed, this will also give us Elixir structs for the data structures we specified in our Thrift file:

After running our Python service we can use the same setup as earlier and make calls through our client:

We are creating Boundary structs with the GeoJSON data for each block and zone, these will be taken in and converted between Elixir and Python data structures during our service calls by the generated Thrift code.

Comparing the two approaches

Setting up either approach was not too difficult, however I had used Thrift before and so the concepts and use of it was familiar. With the Thrift approach the interface between Elixir and Python is clearly defined, but the ErlPort approach took much less code (especially when you count the files generated by Thrift). For this sort of simple case I would go with ErlPort, Thrift was a bit of overkill.

However, there are some cases in which I would go with the Thrift approach instead:

Passing complicated custom data structures with ErlPort does not seem trivial, you have to define your own encoding and decoding whereas with Thrift the data structures defined in the interface are automatically handled between languages. ErlPort Python instances cannot handle multiple calls concurrently, and they use OS processes so you need to be very careful with spawning multiple instances. With Thrift you can move the service to another machine and even setup threaded server types to have it handle multiple requests, you get this option out of the box. Depending on what language you are trying to connect with, Thrift has many options where as with ErlPort you only have Python or Ruby.

So for simple approaches like this GeoJSON example I would go with ErlPort, but for anything complicated I would think Thrift would work better.

Example project repo: https://github.com/chiragtoor/python_calls

Tags