Interoperability of Elixir

A critical aspect of a programming language lies in its interoperability with other programming languages – being able to play nice with others. Whether it is to reuse legacy code or gain better performance with numerical computations, interoperating Elixir with C is a common practice [✥]. The two most popular ways for doing that is either by working with NIF‘s or with ports, using Porcelain.

NIF’s originated from Erlang/OTP R13B03 [♠]. NIF’s are Erlang/Elixir functions written in C, loaded dynamically as a shared library; whereas Ports are separate programs which run separately from the BEAM VM and communicates with the latter via STDIN/STDOUT. NIF’s tend to be simpler to write because they do not have to be concerned about encoding and decoding standard input and outputs, in certain scenarios, this advantage also makes them more efficient. However a segmentation fault in the C code implementing the NIF’s can crash the BEAM VM, making Ports a safer choice. [◆, ♣]

In this tutorial we will look at how to implement NIF’s for our C library of choice Libpostal (a C library that does parsing and normalization of global street addresses.). If you want to jump into the code right away, here is the link to the full project https://github.com/SweetIQ/expostal .

Creating NIF’s for Libpostal

We can start by creating a new Elixir project using mix.

1

mix new expostal



Currently, the recommended way of working with C NIF’s in Elixir is to create Makefiles which get invoked by mix compile .

Project Setup

To make mix compile compiles the C NIF’s, we can add the following module definition to mix.exs :

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

defmodule Mix.Tasks.Compile.Libpostal do

def run (_) do

if match? { :win32 , _}, :os .type do



IO.warn( "Windows is not supported." )

exit( 1 )

else

File.mkdir_p( "priv" )

{result, _error_code} = System.cmd( "make" , [ "priv/parser.so" ], stderr_to_stdout: true )

IO.binwrite result

{result, _error_code} = System.cmd( "make" , [ "priv/expand.so" ], stderr_to_stdout: true )

IO.binwrite result

end

:ok

end

end



Depending on the C library you want to interoperate with and the platform you develop and deploy on, you might need multiple Makefile’s, each targeting different operating systems. In our case, since Libpostal does not run on Windows, we print a warning and exit the program.

Makefile

Next, we can create our Makefile which compiles the NIF’s defined in src/parser.c and src/expand.c into priv/parser.so and priv/expand.so respectively. With normal C libraries, we most likely only need to put everything inside a single dynamic library (i.e. just priv/your_library.so ). But in our case, since Libpostal’s expand function requires loading machine learning model that is not required by the parser function, it is best to keep them separate.)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

MIX = mix

CFLAGS += -g -O3 -ansi -pedantic -Wall -Wextra -Wno-unused-parameter



ERLANG_PATH = $( shell erl - eval 'io:format("~s", [lists:concat([code:root_dir() , "/erts-" , erlang:system_info(version), "/include" ])])' -s init stop -noshell)

CFLAGS += -I $(ERLANG_PATH)





CFLAGS += -I/usr/local/ include -I/usr/ include -L/usr/local/lib -L/usr/lib

CFLAGS += -lpostal

CFLAGS += -std=gnu99 -Wno-unused-function



ifeq ( $( wildcard deps/libpostal) ,)

LIBPOSTAL_PATH = ../libpostal

else

LIBPOSTAL_PATH = deps/libpostal

endif



ifneq ( $(OS) ,Windows_NT)

CFLAGS += -fPIC



ifeq ( $( shell uname) ,Darwin)

LDFLAGS += -dynamiclib -undefined dynamic_lookup

endif

endif







all: libpostal



libpostal:

$(MIX) compile



priv/parser.so: src/parser.c

$(CC) $(CFLAGS) -shared $(LDFLAGS) -o $@ src/parser.c



priv/expand.so: src/expand.c

$(CC) $(CFLAGS) -shared $(LDFLAGS) -o $@ src/expand.c



clean:

$(MIX) clean

$(RM) priv/*



If the C library you are working with is not installed system-wide (i.e. under /usr/local or /usr ), or if you’d like to embedded the C library within your project, check out how hoedown project embeds its C dependency.

Deciding whether to embed the C library or require a system-wide installation is a controversial design decision. [Ω] From the developer of the Node.JS binding for Libpostal:

Usually when dynamically linking to a native library, it’s either assumed that the library is installed separately, or that the dependency is included with the binding. Let’s call these the “lean repo” and the “fat repo” approaches respectively. node-postal is an example of a lean repo, whereas a fat repo would be something like node-snappy.

libpostal is a bit trickier than a library like Snappy because it’s not just software - there are also data/model files which need to be downloaded from the web…

I felt that the same argument can be applied to this Elixir binding.

Implementing NIF’s

We’ve finished setting up the build process, it’s time that we actually implement those Native Implemented Functions. For Libpostal parser, our goal is to create an Elixir/Erlang function that calls libpostal_parse_address from the Libpostal C library. The signature of libpostal_parse_address is as of the following:

1

2

3

4

5

6

7

8

9

10

11

12

typedef struct libpostal_address_parser_response {

size_t num_components;

char **components;

char **labels;

} libpostal_address_parser_response_t ;



typedef struct libpostal_address_parser_options {

char *language;

char *country;

} libpostal_address_parser_options_t ;



libpostal_address_parser_response_t * libpostal_parse_address ( char *address, libpostal_address_parser_options_t options) ;



When passed in an address, libpostal_parse_address returns the address components as a libpostal_address_parser_response_t structure. For example, when passed in 845 Sherbrooke St W, Montreal, QC H3A 0G4 as address and together with default options, the function returns:

1

2

3

num_components: 5,

components: ["845", "Sherbrooke St W", "Montreal", "QC", "H3A 0G4"]

labels: ["house_number", "road", "city", "state", "postalcode"]



This is not a very Elixir-esque way of returning values. In Elixir, we can elegantly use a Map type to represent the label-component key-values. We will see how we can do that later.

Load and unloading

In order for the BEAM VM to interact with C functions, we need to register them with the VM. The src/parser.c file starts with the following structure:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22







static ERL_NIF_TERM

parse_address(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]) {}



static ErlNifFunc funcs[] = {

{ "parse_address" , 1 , parse_address}};



static int

load(ErlNifEnv *env, void **priv, ERL_NIF_TERM info) {}



static int

reload(ErlNifEnv *env, void **priv, ERL_NIF_TERM info) {}



static int

upgrade(ErlNifEnv *env, void **priv, void **old_priv, ERL_NIF_TERM info) {}



static void

unload(ErlNifEnv *env, void *priv) {}



ERL_NIF_INIT(Elixir.Expostal.Parser, funcs, &load, &reload, &upgrade, &unload)



A dynamic library implementing NIF’s needs to registers itself via the ERL_NIF_INIT macro, providing its namespace, functions to expose and series of function (load, reload upgrade, unload) that defines the life cycle of the NIF library. [δ]

We are particularity interested by the load and unload function. When the parser NIF library loads, we need to initialize Libpostal to load a machine learning model shared by process-local threads. We do that by calling libpostal_setup and libpostal_setup_parser functions.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

static int

load(ErlNifEnv *env, void **priv, ERL_NIF_TERM info)

{

if (!libpostal_setup())

{

fprintf ( stderr , "Error loading libpostal" );

return 1 ;

}

if (!libpostal_setup_parser())

{

fprintf ( stderr , "Error loading libpostal parser" );

return 1 ;

}



return 0 ;

}



Similarity, we want to make sure to properly free up resource when the Erlang VM decides to unload the module.

1

2

3

4

5

6

static void

unload(ErlNifEnv *env, void *priv)

{

libpostal_teardown();

libpostal_teardown_parser();

}



The reload and upgrade functions are implemented as the following:

1

2

3

4

5

6

7

8

9

10

11

static int

reload(ErlNifEnv *env, void **priv, ERL_NIF_TERM info)

{

return 0 ;

}



static int

upgrade(ErlNifEnv *env, void **priv, void **old_priv, ERL_NIF_TERM info)

{

return load(env, priv, info);

}



Implementing parse_address function as NIF

Next, we can finally implement the parse_address function, if you remember seeing previously, libpostal_parse_address takes as input an address string and emit a custom struct that defines the components and labels. When a user calls parse_address in Elixir, we need to call libpostal_parse_address under the hood. Except in this case, when parse_address is called, the input is not a C char* , but an Elixir string. We need to cast this Elixir string into a C char pointer and then pass it into libpostal_parse_address . The output of libpostal_parse_address is a C struct, but we want to output it as a Elixir/Erlang Map, so that the user can enjoy the elegancy of a modern programming language.

Enough said, here’s a carefully commented implementation:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

parse_address(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[])

{



libpostal_address_parser_options_t options = libpostal_get_address_parser_default_options();





ERL_NIF_TERM components = enif_make_new_map(env);





ErlNifBinary address_bin;







if (!enif_inspect_iolist_as_binary(env, argv[ 0 ], &address_bin))

{



return enif_make_badarg(env);

}





char *address = strndup(( char *) address_bin.data, address_bin.size);





libpostal_address_parser_response_t *response = libpostal_parse_address(address, options);



const char *component, *label;





size_t i;

for (i = 0 ; i < response->num_components; i++)

{

component = response->components[i];

label = response->labels[i];



ERL_NIF_TERM component_term;





unsigned char *component_term_bin = enif_make_new_binary(env, strlen (component), &component_term);

strncpy (component_term_bin, component, strlen (component));





enif_make_map_put(env, components,

enif_make_atom(env, label),

component_term,

&components);

}





enif_release_binary(&address_bin);

libpostal_address_parser_response_destroy(response);

free (address);

return components;

}



Implementing parse_address function in Elixir

Now we have the NIF implemented, we need to create its Elixir counter part. The Elixir module needs to load the NIF as it initializes, and define the signature of the NIF function. As shown below:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

defmodule Expostal.Parser do

@moduledoc """

Address parsing module for Openvenue's Libpostal, which does parses addresses.

"""



@on_load { :init , 0 }



app = Mix.Project.config[ :app ]





def init do

path = :filename .join( :code .priv_dir(unquote(app)), 'parser' )

:ok = :erlang .load_nif(path, 0)

end



@doc """

Parse given address into a map of address components

## Examples

iex> Expostal.Parser.parse_address("845 Sherbrooke St W, Montreal, QC H3A 0G4")

%{city: "montreal", house_number: "845",

road: "sherbrooke st w", state: "qc",

postalcode: "h3a 0g4"}

"""

@spec parse_address(address :: String.t) :: String.t

def parse_address (address)

def parse_address (_) do



exit( :nif_library_not_loaded )

end



end



And there we go, libpostal’s parse_address function can now be invoked inside Elixir:

1

2

3

4

iex> Expostal.Parser.parse_address("845 Sherbrooke St W, Montreal, QC H3A 0G4")

%{city: "montreal", house_number: "845",

road: "sherbrooke st w", state: "qc",

postalcode: "h3a 0g4"}



Summary

This tutorial covered how we can create Elixir NIF’s from scratch using Expostal (an Elixir binding for Libpostal) as an example. NIF serves as a bridge between C code and Elixir code, allowing you to call C functions inside Elixir. The steps to create an Elixir NIF is as the following:

Create a new project and setup Mix compile task. Create Makefile (or multiple of them, if supporting multiple OS is required) Implement the NIF’s in C Implement the Elixir module counterpart (init and function definitions)

The entire experience is not that much different from implementing a binding for Python or for Node.JS. But as Elixir/Erlang is a language that supports concurrent programming by design, one must pay more attention to the thread-safety aspects of the implementation when implementing NIF’s. This is a challenge that Node.JS binding implementors do not have to worry, because of its single-threaded design.

If you wish to download the full source code, it is available on Github: https://github.com/SweetIQ/expostal . And star the project while you are at it!