Published on Oct 28, 2016 Elixir

Why IO lists

Nathan Long recently published Elixir and IO Lists, Part 1: Building Output Efficiently where he described using IO lists to make your own code more efficient.

He gives an example of using Enum.map to build an IO list.

users = [%{ name: "Amy" }, %{ name: "Joe" }] response = Enum . map ( users , fn ( user ) -> [ "<li>" , user . name , "</li>" ] end ) IO . puts response

The repeated strings are created only once in memory in this example. Building the same string using interpolation ( "<li>#{user.name}</li>" ) or concatenation ( "<li>" <> user.name <> "</li>" ) would require additional string allocations and copying of data. It would be far less memory efficient, creating unnecessary work for the garbage collector.

Usage

Immediately I thought of a use for IO lists in one of my own open-source Elixir projects. I’m building an EventStore using PostgreSQL for persistence. Building the SQL statement to insert a batch of events uses both string concatenation and interpolation. As there is a lot of repetition in the result string it’s an ideal candidate for using an IO list.

Verifying Postgrex supports querying with IO lists

Postgrex is used as the driver to interface with PostgreSQL for the EventStore. The insert SQL statement is passed to the Postgrex.query/4 function. Documentation for query/4 indicates that the statement argument is expected to be iodata . So it natively supports IO lists.

I verify that querying PostgreSQL using a SQL statement provided as a string or an IO list produces the expected output by running an iex Elixir interactive shell.

$ iex - S mix Erlang / OTP 19 [ erts - 8.1 ] [ source ] [ 64 - bit ] [ smp: 4 : 4 ] [ async - threads: 10 ] [ hipe ] [ kernel - poll: false ] [ dtrace ] Interactive Elixir ( 1.3 . 3 ) - press Ctrl + C to exit ( type h () ENTER for help ) iex ( 1 ) > storage_config = Application . get_env ( :eventstore , EventStore . Storage ) [ username: "postgres" , password: "postgres" , database: "eventstore_dev" , hostname: "localhost" , pool_size: 10 , extensions: [{ Postgrex . Extensions . Calendar , []}]] iex ( 2 ) > { :ok , conn } = Postgrex . start_link ( storage_config ) { :ok , #PID<0.208.0>} iex ( 3 ) > Postgrex . query! ( conn , "select * from events;" , []) % Postgrex . Result { columns: [ "event_id" , "stream_id" , "stream_version" , "event_type" , "correlation_id" , "data" , "metadata" , "created_at" ], command: :select , connection_id: 16993 , num_rows: 0 , rows: []} iex ( 4 ) > Postgrex . query! ( conn , [ "select *" , "from events" , ";" ], []) % Postgrex . Result { columns: [ "event_id" , "stream_id" , "stream_version" , "event_type" , "correlation_id" , "data" , "metadata" , "created_at" ], command: :select , connection_id: 16993 , num_rows: 0 , rows: []}

Implementation

The change to the EventStore was limited to the single EventStore.Sql.Statements.create_events/1 function. Originally the SQL statement was built by mapping the number of events to insert using string interpolation and concatenation, as show below.

def create_events ( number_of_events \\ 1 ) do insert = "INSERT INTO events (event_id, stream_id, stream_version, correlation_id, event_type, data, metadata, created_at) VALUES" params = 1 .. number_of_events |> Enum . map ( fn event_number -> index = ( event_number - 1 ) * 8 "($ #{ index + 1 } , $ #{ index + 2 } , $ #{ index + 3 } , $ #{ index + 4 } , $ #{ index + 5 } , $ #{ index + 6 } , $ #{ index + 7 } , $ #{ index + 8 } )" end ) |> Enum . join ( "," ) insert <> " " <> params <> ";" end

This was modified to use nested IO lists to build the SQL statement. Nesting to append to a list is an O(1) operation and doesn’t require copying any data.

def create_events ( number_of_events \\ 1 ) do insert = [ "INSERT INTO events (event_id, stream_id, stream_version, correlation_id, event_type, data, metadata, created_at) VALUES" ] params = 1 .. number_of_events |> Enum . map ( fn event_number -> index = ( event_number - 1 ) * 8 event_params = [ "($" , Integer . to_string ( index + 1 ), ", $" , Integer . to_string ( index + 2 ), ", $" , Integer . to_string ( index + 3 ), ", $" , Integer . to_string ( index + 4 ), ", $" , Integer . to_string ( index + 5 ), ", $" , Integer . to_string ( index + 6 ), ", $" , Integer . to_string ( index + 7 ), ", $" , Integer . to_string ( index + 8 ), ")" ] if event_number == number_of_events do event_params else [ event_params , "," ] end end ) [ insert , " " , params , ";" ] end

Use IO lists to build insert events SQL statement

Running the suite of unit tests confirmed that the change was good.

Benchmarking

The EventStore has a benchmark test suite that uses the Benchfella microbenchmarking tool for Elixir. Benchfella allows you to define a bench test and compare measurements between multiple runs. It can also plot the results of multiple runs as a graph.

Existing benchmark tests covered the main use cases for the EventStore: appending events to a stream, and reading events from a stream. The benchmark suite included separate tests for a single, 10 or 100 concurrent readers/writers.

An example benchmark test to append 100 events (generated before the test run) to the EventStore is shown below.

defmodule AppendEventsBench do use Benchfella alias EventStore . EventFactory alias EventStore . Storage setup_all do Application . ensure_all_started ( :eventstore ) end before_each_bench ( store ) do { :ok , EventFactory . create_events ( 100 )} end bench "append events, single writer" do EventStore . append_to_stream ( UUID . uuid4 , 0 , bench_context ) end end

I measured the before and after timings of the EventStore benchmark suite to ensure that the change actually improved performance.

Running the benchmark suite

To ensure a clean environment I drop and recreate the PostgreSQL database before running the benchmark tests.

MIX_ENV=bench mix do es.reset, app.start, bench

Benchmark before results

## AppendEventsBench benchmark name iterations average time append events, single writer 100 15787.46 µs/op append events, 10 concurrent writers 10 137400.10 µs/op append events, 100 concurrent writers 1 1495547.00 µs/op ## ReadEventsBench benchmark name iterations average time read events, single reader 1000 2151.71 µs/op read events, 10 concurrent readers 100 20172.47 µs/op read events, 100 concurrent readers 10 202203.20 µs/op

Benchmark after results

## AppendEventsBench benchmark name iterations average time append events, single writer 100 12239.92 µs/op append events, 10 concurrent writers 10 108868.50 µs/op append events, 100 concurrent writers 1 1116299.00 µs/op ## ReadEventsBench benchmark name iterations average time read events, single reader 1000 1928.22 µs/op read events, 10 concurrent readers 100 20102.93 µs/op read events, 100 concurrent readers 10 201094.80 µs/op

Benchmark comparison

Benchfella provides a tool to compare the last two runs. It displays the performance gain, or loss, between the two runs for each benchmark test.

$ MIX_ENV=bench mix bench.cmp bench/snapshots/2016-10-28_12-33-28.snapshot vs bench/snapshots/2016-10-28_12-34-12.snapshot ## AppendEventsBench append events, 100 concurrent writers 0.75 append events, single writer 0.78 append events, 10 concurrent writers 0.79 ## ReadEventsBench read events, single reader 0.9 read events, 100 concurrent readers 0.99 read events, 10 concurrent readers 1.0

This demonstrates that using IO lists was a performance gain. It is between 20 - 25% quicker to append events to the database after the refactoring. Reading events is unaffected.

Comparison graph

Generating a graph of the comparison results is simple with Benchfella.

$ MIX_ENV=bench mix bench.graph Wrote bench/graphs/index.html

Linear scale

Logarithmic scale

Conclusion

Using an IO list for string manipulation can be a performance gain. As Nathan concluded in his Elixir and IO Lists article.