After upgrading Rails 3.2 to 4.0 we had an interesting and yet unexpected new error:

ActiveRecord::StatementInvalid: Mysql2::Error: Data too long for column 'target_column'...

It turned out that MySQL strict mode was not enabled in our Rails 3 project. And as a consequence MySQL silently truncated the content of all string or text type columns in case if they didn’t fit in the size of the given data type. In Rails 4 strict mode is enabled by default, which is a very useful option and I definitely recommend to turn it on to avoid nasty bugs to hide.

So as a hotfix I decided to truncate the content to be aligned with the previous functionality.

I started with a simple test:

describe '#truncate_target_column_to_fit_into_max_storage_size' do context 'when the target_column field is blank' do it 'throws no error' do described_class . create ( target_column : nil ) end end context 'when the target_column field is too long' do it 'truncates it' do max_length_of_target_column = 65_535 target_column = 'a' * ( max_length_of_target_column + 1 ) model_instance = described_class . create ( target_column : target_column ) expect ( model_instance . target_column . length ) . to eq ( max_length_of_target_column ) end end end

Then the implementation came along:

class MyModel < ActiveRecord : :Base before_save :truncate_target_column_to_fit_into_max_storage_size private def truncate_target_column_to_fit_into_max_storage_size return if target_column . blank ? max_char_length = 65_535 self . target_column = target_column [ 0 , max_char_length ] end end

As a result the error rate dropped dramatically, but not 100%.

I found an example from the logs, which had less than 65 000 characters. I was one step closer to start tearing my hair out. As my last desperate attempt I tried target_column.bytesize which returned way more than 65 536, which is the storage limit in bytes of the text data type in MySQL. Hmmm, sudden light flooded my brain: it is a spanish product and spanish people like to use chars like ç which is not an ASCII character and needs more space to be stored.

A quick test in IRB proved my wild hypothesis.

'c' . bytesize 'ç' . bytesize

Meaning truncating upon character length is not enough if the content has non-ASCII chars.

Let’s add a test for this new case:

describe '#truncate_target_column_to_fit_into_max_storage_size' do context 'when the target_column field is blank' do it 'throws no error' do described_class . create ( target_column : nil ) end end context 'when the target_column field is too long' do context 'when it has only ascii chars' do it 'truncates it' do max_length_of_target_column = 65_535 target_column = 'a' * ( max_length_of_target_column + 1 ) cre_request = described_class . create ( target_column : target_column ) expect ( cre_request . target_column . length ) . to eq ( max_length_of_target_column ) end end context 'when it starts with two unicode chars' do it 'truncates it and removes two more normal characters to compensate the extra space unicode needs to store chars' do max_length_of_target_column = 65_535 target_column = 'çç' + ( 'a' * max_length_of_target_column ) cre_request = described_class . create ( target_column : target_column ) expect ( cre_request . target_column . length ) . to eq ( max_length_of_target_column - 2 ) end end end end

And don’t forget to pimp up the hotfix:

class MyModel < ActiveRecord : :Base before_save :truncate_target_column_to_fit_into_max_storage_size def truncate_target_column_to_fit_into_max_storage_size return if target_column . blank ? max_field_size_in_bytes = 65_535 self . target_column = target_column [ 0 , max_field_size_in_bytes ] while target_column . bytesize > max_field_size_in_bytes self . target_column = target_column [ 0 . . - 2 ] end end end

And Taa Daa, the errors disappeared into thin air. In the real application these fields we had problems with are not filled by the user, but our automated processes. Meaning, after a fresh Rails 4 deploy this was quicker than talking with other departments and changing the business logic.

Update:

Rails 6 added String#truncate_bytes to truncate strings to a maximum bytesize without breaking multibyte characters or grapheme clusters 🐿