This blog is part of our Ruby 2.4 series.

Ruby has lstrip and rstrip methods which can be used to remove leading and trailing whitespaces respectively from a string.

Ruby also has strip method which is a combination of lstrip and rstrip and can be used to remove both, leading and trailing whitespaces, from a string.

" Hello World " . lstrip #=> "Hello World " " Hello World " . rstrip #=> " Hello World" " Hello World " . strip #=> "Hello World"

Prior to Ruby 2.4, the rstrip method was optimized for performance, but the lstrip and strip were somehow missed. In Ruby 2.4, String#lstrip and String#strip methods too have been optimized to get the performance benefit of String#rstrip .

Let’s run following snippet in Ruby 2.3 and Ruby 2.4 to benchmark and compare the performance improvement.

require 'benchmark/ips' Benchmark . ips do | bench | str1 = " " * 10_000_000 + "hello world" + " " * 10_000_000 str2 = str1 . dup str3 = str1 . dup bench . report ( 'String#lstrip' ) do str1 . lstrip end bench . report ( 'String#rstrip' ) do str2 . rstrip end bench . report ( 'String#strip' ) do str3 . strip end end

Result for Ruby 2.3

Warming up -------------------------------------- String #lstrip 1.000 i/100ms String #rstrip 8.000 i/100ms String #strip 1.000 i/100ms Calculating ------------------------------------- String #lstrip 10.989 (± 0.0%) i/s - 55.000 in 5.010903s String #rstrip 92.514 (± 5.4%) i/s - 464.000 in 5.032208s String #strip 10.170 (± 0.0%) i/s - 51.000 in 5.022118s

Result for Ruby 2.4

Warming up -------------------------------------- String #lstrip 14.000 i/100ms String #rstrip 8.000 i/100ms String #strip 6.000 i/100ms Calculating ------------------------------------- String #lstrip 143.424 (± 4.2%) i/s - 728.000 in 5.085311s String #rstrip 89.150 (± 5.6%) i/s - 448.000 in 5.041301s String #strip 67.834 (± 4.4%) i/s - 342.000 in 5.051584s

From the above results, we can see that in Ruby 2.4, String#lstrip is around 14x faster while String#strip is around 6x faster. String#rstrip as expected, has nearly the same performance as it was already optimized in previous versions.

Performance remains same for multi-byte strings

Strings can have single byte or multi-byte characters.

For example Lé Hello World is a multi-byte string because of the presence of é which is a multi-byte character.

'e' . bytesize #=> 1 'é' . bytesize #=> 2

Let’s do performance benchmarking with string Lé hello world instead of hello world .

Result for Ruby 2.3

Warming up -------------------------------------- String #lstrip 1.000 i/100ms String #rstrip 1.000 i/100ms String #strip 1.000 i/100ms Calculating ------------------------------------- String #lstrip 11.147 (± 9.0%) i/s - 56.000 in 5.034363s String #rstrip 8.693 (± 0.0%) i/s - 44.000 in 5.075011s String #strip 5.020 (± 0.0%) i/s - 26.000 in 5.183517s

Result for Ruby 2.4

Warming up -------------------------------------- String #lstrip 1.000 i/100ms String #rstrip 1.000 i/100ms String #strip 1.000 i/100ms Calculating ------------------------------------- String #lstrip 10.691 (± 0.0%) i/s - 54.000 in 5.055101s String #rstrip 9.524 (± 0.0%) i/s - 48.000 in 5.052678s String #strip 4.860 (± 0.0%) i/s - 25.000 in 5.152804s

As we can see, the performance for multi-byte strings is almost the same across Ruby 2.3 and Ruby 2.4.

Explanation

The optimization introduced is related to how the strings are parsed to detect for whitespaces. Checking for whitespaces in multi-byte string requires an additional overhead. So the patch adds an initial condition to check if the string is a single byte string, and if so, processes it separately.

In most of the cases, the strings are single byte so the performance improvement would be visible and helpful.