Published on Sep 26, 2014

I did a little CLI tool which listed files in a directory plus a few extra information in an ASCII table. To calculate it I need the longest filename and I had issues when UTF-8 multibyte characters were in them: The special characters were counted as two characters.

The issue here is that OS X use a slightly different UTF-8 than you would think. Look at this:

[ 1 ] pry ( main ) > str = File . basename ( Dir [ "Desktop/*" ][ 2 ] ) => "möp" [ 2 ] pry ( main ) > str . length => 4 [ 3 ] pry ( main ) > "möp" . length => 3 [ 4 ] pry ( main ) > str . encoding => #<Encoding:UTF-8> [ 5 ] pry ( main ) > "möp" . encoding => #<Encoding:UTF-8> [ 6 ] pry ( main ) > str == "möp" => false

At this point I was confused. It looked the same, it had the same encoding still it's not the same (and longer). So what's the trick here? Encode the path to UTF-8-MAC and everything is fine: