Being from México, I have been wrestling with character encoding issues for a long time, in several languages…

Now, it’s Elixir’s time.

§ The problem

When working my way through The little Elixir & OTP guidebook —a highly recommended one— I got stuck at the ID3 parser example program:

defmodule ID3Parser do def parse (file_name) do case File.read(file_name) do { :ok , mp3} -> mp3_byte_size = byte_size(mp3) - 128 <<_::binary-size(mp3_byte_size), id3_tag::binary>> = mp3 << "TAG" , title::binary-size( 30 ), artist::binary-size( 30 ), album::binary-size( 30 ), year::binary-size( 4 ), _rest::binary>> = id3_tag IO.puts " #{artist} - #{title} ( #{album} #{year} )" _ -> IO.puts "Couldn't open #{file_name} " end end end

Using Clementine I edited the ID3 tags for a file named some-song.mp3 .

And put Éso as its title . I wanted to know if the program would handle those just fine. It did not.

It was all right when the ID3 tags contained only valid ASCII characters, as soon as I put an accented character in the title , artist or album what I got was an error like this:

iex( 1 )> ID3Parser.parse "some-song.mp3" ** (ArgumentError) argument error (stdlib) :io .put_chars( :standard_io , :unicode , [ << 89 , 111 , 112 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32 , 45 , 32 , 201 , 115 , 111 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>>, 10 ])

§ The solution

After some research here and there, then some error reporting… I found out that ID3v1 tags —the ones the program is trying to parse— should in theory be encoded as ISO-8859-1, also known as Latin 1 .

What I needed was a way to convert those bytes from ISO-5589-1 (Latin 1) to UTF-8 (Unicode), and give IO.puts something it could print without problems.

I found exactly that in this Erlang facility:

:unicode .characters_to_binary(your_string, :latin1 )

This is the final program that correctly parses ID3v1 tags in their expected encoding —careful, the encoding is expected, but in no way guaranteed:

defmodule ID3Parser do def parse (file_name) do case File.read(file_name) do { :ok , mp3} -> mp3_byte_size = byte_size(mp3) - 128 <<_::binary-size(mp3_byte_size), id3_tag::binary>> = mp3 << "TAG" , title::binary-size( 30 ), artist::binary-size( 30 ), album::binary-size( 30 ), year::binary-size( 4 ), _rest::binary>> = id3_tag to_convert = [title, artist, album, year] [title, artist, album, year] = Enum.map(to_convert, fn tag -> from_latin1(tag) end ) IO.puts " #{artist} - #{title} ( #{album} #{year} )" _ -> IO.puts "Couldn't open #{file_name} " end end defp from_latin1 (string) do :unicode .characters_to_binary(string, :latin1 ) end end

Hopefully this will help someone else in the same predicament.