I have often read that for hash keys it is better to use symbols than strings. So I was interested why and what is performance impact. It is quite easy to create a test scenario to measure it. The blog post also contains technical explanation and shows potential security problem.

My test scenario is quite easy. Let’s create simple hash and lookup for a key in the hash. Lets have the keys of four different kind: short string, short symbol, long string and long symbol. For measuring I use internal ruby measuring library Benchmark. Here is code:

require "benchmark" precomputed_string = "Very long string value"*1000 precomputed_symbol = precomputed_string.to_sym MAP = { "key1" => true, :key2 => true, precomputed_string => true, precomputed_symbol => true } Benchmark.bm(20) do |x| x.report("string") do 10000000.times { MAP["key1"] } end x.report("symbol") do 10000000.times { MAP[:key2] } end x.report("long string/100") do 100000.times { MAP[precomputed_string] } end x.report("long symbol") do 10000000.times { MAP[precomputed_symbol] } end end

Please note that for long string key I’m using less iterations, because it would be too. And here is result from my machine:

string 4.360000 0.000000 4.360000 ( 4.365123) symbol 2.870000 0.000000 2.870000 ( 2.868708) long string/100 8.460000 0.000000 8.460000 ( 8.471581) long symbol 2.890000 0.000000 2.890000 ( 2.884652)

As you can see, even for short string it is faster to use symbol then string. For longer symbol keys, the time does not grow, so the speed of hash lookup doesn’t depend on key length. As you can see, the situation is different for string keys.

Why it is? The reason is hidden in the hash implementation. Hash uses a hashing function for the lookup ( ted mI agree that it is little confusing to name in ruby Map as Hash). Symbols have this value “precomputed”, but for string you need to compute it again for whole string. For symbol its hash value is simple object_id which never changes, but string have different object for each instance ( string is not immutable like in java ), so to compare if two strings have same hash you need to compute it. Short demonstration about object_id difference:

"test".object_id "test".object_id :test.object_id :test.object_id

So should you use symbol always? There is one disadvantage. To keep symbol value always same (in one ruby process), unused symbol is not removed during run of garbage collector. Here’s the code that demonstrates it:

#for string def test val map = {} 1000.times do |i| value = val*(i+1) map[value] = true end return nil end 100.times do |i| test "test#{i}" GC.start end puts `cat /proc/#{$$}/status | grep 'VmSize:'`

#for symbol def test val map = {} 1000.times do |i| value = val*(i+1) map[value.to_sym] = true end end 100.times do |i| test "test#{i}" GC.start end puts `cat /proc/#{$$}/status | grep 'VmSize:'`

My results:

String: VmSize: 24856 kB Symbol: VmSize: 343324 kB

So it is a trade-off between memory and speed. It is very important for long running tasks to have control about what is stored in symbols. Consider this code snapshot for long running server:

#get option value VALUE_TO_DB_MAP = { :external => 1, :internal => 2, :both => 3 } def update params db_value = VALUE_TO_DB_MAP[params[:option1].to_sym] end

And now consider what happens if attacker sends there non-friendly long string. He can easily cause DOS from one machine.

I welcome any questions or suggestions in your comments.

Both comments and pings are currently closed.

Tags: No tags available

No tags available Category: Uncategorized