Truncating UTF8 Input For Apple Push Notifications (APNS) in Ruby

When sending push notifications (APNS) to apple devices such iPhone or iPad there is a constraint that makes implementing it a bit challenging:

The maximum size allowed for a notification payload is 256 bytes; Apple Push Notification Service refuses any notification that exceeds this limit

This wouldn’t be a problem itself unless you want to put user input into the notification. This also wouldn’t be that hard unless the input can be international and contain non-ascii character. Which still would not be so hard, but the payload is in JSON and things get a little more complicated sometimes. Who said that formatting push notification is easy?

Desired payload

{ aps: { alert: "'User X' started following you" , } , path: "appnameios://users/123" , }

This is simplified version of our payload. The notifications is about someone who started following you on our fancy social platform that we are writing. The path allows the app to open it on a view related to the user who started following. The things that are going to vary are user name (User X in our example) and user id (123).

Payload template

So let’s extract the template of the payload into a method. This will come handy later:

def payload_template ( user_name , user_id ) { aps: { alert: "' #{ user_name } ' started following you" , }, path: "appnameios://users/ #{ user_id } " , } end

Bytes, bytes everywhere

Remember when I said that we have 256 bytes? We do, but number of useful bytes for our case is even smaller.

payload_template ( "" , "" ). to_json . bytesize # => 73

Even when we don’t substitute data into our payload we are out of 73 bytes. That means we have only…

MAX_APS_BYTES = 256 def payload_arg_max_size MAX_APS_BYTES - payload_without_args_size end def payload_without_args_size payload_template ( "" , "" ). to_json . bytesize end payload_arg_max_size # => 183

… 183 bytes for user input

If your payload (required for the app to properly behave when the notification is clicked) is bigger or your message is longer you are left with even fewer bytes of user input.

Not everything can be truncated

But wait… We can’t truncate user id. If we did we could be misleading about who actually started following the recipient of the notification. So even though its length vary, we can’t truncate it.

We can see that the logic for this is slowly getting more and more complicated. That’s why for every push notification we have a class that encapsulates the logic of formatting it properly according to APNS rules.

class StartedFollowing < Struct . new ( :user_name , :user_id ) def payload # ... end private def payload_template ( user_name ) { aps: { alert: "' #{ user_name } ' started following you" , }, path: "appnameios://users/ #{ user_id } " , } end MAX_APS_BYTES = 256 def payload_arg_max_size MAX_APS_BYTES - payload_without_args_size end def payload_without_args_size payload_template ( "" ). to_json . bytesize end end

Truncating

Ok, we know how many bytes we have so let’s truncate our international string. But remember that we are not truncating up to N chars, we are truncating up to N bytes! We can use String#byteslice for that.

It’s all nice and handy if we happen to truncate exactly between characters.

"łøü" . bytes # => [197, 130, 195, 184, 195, 188] "łøü" . byteslice ( 0 , 4 ) # => "łø"

But sometimes we won’t:

"łøü" . byteslice ( 0 , 3 ) => "ł \xC3 "

We are left we one proper character and one byte which is ugly.

I’ve been looking long time to figure out how to properly fix it and it seems that the right answer is String#scrub . For those of you who are stuck with older ruby version, there is backport of it in form of string-scrub gem.

So if you ever need to truncate user provided utf-8 string and support international characters byteslice + scrub will do the job for you:

"łøü" . byteslice ( 0 , 3 ). scrub ( "" ) => "ł"

Full solution

require 'string-scrub' unless String . instance_methods . include? ( :scrub ) require 'json' class StartedFollowing < Struct . new ( :user_name , :user_id ) InvalidPayloadGenerated = Class . new ( StandardError ) def payload raise PayloadTooBigToGenerate if payload_arg_max_size < 0 payload_template ( truncated_user_name ). tap do | hash | size = hash . to_json . bytesize size <= MAX_APS_BYTES or raise ( InvalidPayloadGenerated . new ( "Payload size was: #{ size } " ) ) end end private def payload_template ( name ) { aps: { alert: "' #{ name } ' started following you" , }, path: "appnameios://users/ #{ user_id } " , } end MAX_APS_BYTES = 256 def payload_arg_max_size MAX_APS_BYTES - payload_without_args_size end def payload_without_args_size payload_template ( "" ). to_json . bytesize end def truncated_user_name user_name . byteslice ( 0 , payload_arg_max_size ). scrub ( "" ) end end notif = StartedFollowing . new ( "łøü" * 100 , 12345 ) notif . payload # => {:aps=>{:alert=>"'łøüłøüłøüłøüłøüłøüłøüłøüłøüłø # üłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüłøüł # øüłøüłøüłø' started following you"}, :path=>"appnameios://users/12345"} notif . payload . to_json . bytesize # => 256

Yay! We used our payload to full extent!

Troubles

I added this line size <= MAX_APS_BYTES or raise InvalidPayloadGenerated.new("Payload size was: #{size}") at the end just to make sure that everything is ok with my approach and catch errors early (and implemented tests as well). Lucky me!

In my case it turned out my json encoder was using numeric escape characters, so they way I calculated the size of my truncated size was wrong because in JSON it turned out to be bigger:

puts "łøü" . to_json # => "łøü" "łøü" . to_json . bytesize # => 8 # 6 bytes for string plus 2 bytes for ""

vs

irb ( main ): 05 9 : 0 > puts "łøü" . to_json # => "\u0142\u00f8\u00fc" "łøü" . to_json . bytesize # => 20

So I extracted the code responsible to truncating one string into a class

class TruncateStringWithMbChars def initialize ( string_with_mb_chars , maxbytes ) @string_with_mb_chars = string_with_mb_chars @maxbytes = maxbytes end def call string_with_mb_chars . mb_chars [ 0 .. last_char_id ]. to_s end private attr_reader :string_with_mb_chars , :maxbytes def last_char_id string_with_mb_chars . each_char . map { | c | c . to_json . bytesize }. each_with_index . inject ( maxbytes ) do | bytesum , ( bytes , i ) | bytesum -= ( bytes - 2 ) ; return i - 1 if bytesum < 0 ; bytesum end return string_with_mb_chars . size end end

This algorithm basically iterates over every char, checks how many bytes it is going to take in our json payload and stops when we don’t have more space for our text. I am not proud of this code. Do you know a better way of how to do it? What’s they right way to check how many bytes a char will take if encoded as numeric escape character? I am sure there must be an easier way to do it.

Warning: It has a bug when maxbytes is not enough for even one character to be left.

Multiple strings to substitute in notifications

The logic gets even more complicated if you want to embed in your payload multiple strings. Good example can be a notification like ‘UserX’ & ‘UserY’ invite you to game ‘Game’. We could use ⅓ of bytes for each substituted string in naive implementation. But I wanted the algorithm to be smart and work well even in case when some names are long and some are short. My algorithm for truncating multiple strings so that they all use no more than N bytes looks like this:

class TruncateMultipleStrings def initialize ( strings , maxjsonbytes ) @strings = strings @maxjsonbytes = maxjsonbytes end def call hash = @strings . inject ({}) do | memo , string | memo [ string . object_id ] = string ; memo end maxjsonbytes = @maxjsonbytes hash . values . sort_by { | s | string_json_bytesize ( s ) }. each_with_index do | string , index | maxjsonbytes_for_string = maxjsonbytes / ( @strings . size - index ) shortened = TruncateStringWithMbChars . new ( string , maxjsonbytes_for_string ). call maxjsonbytes -= string_json_bytesize ( shortened ) hash [ string . object_id ] = shortened end hash . values end private def string_json_bytesize ( string ) string . to_json . bytesize - 2 end end

Be aware that it doesn’t favor any of the String. If they are all very long, then all of them will be allowed to use same amount of bytes. If any of the strings is short, then the unused bytes are split equally amongst the other strings.

TruncateMultipleStrings . new ( [ "short" , "medium medium" , "long " * 30 ], 60 ). call # => [ # "short", # "medium medium", # "long long long long long long long long lo" # ] TruncateMultipleStrings . new ( [ "long " * 30 , "medium medium" , "long " * 30 ], 60 ). call # => [ # "long long long long lon", # "medium medium", # "long long long long long" # ] TruncateMultipleStrings . new ( [ "long " * 30 , "long " * 30 , "long " * 30 ], 60 ). call # => [ # "long long long long ", # "long long long long ", # "long long long long " # ]

Here is an example of class that could be using it

class GameInvited < Struct . new ( :user1 , :user2 , :game_name , :game_id ) InvalidPayloadGenerated = Class . new ( StandardError ) def payload raise PayloadTooBigToGenerate if payload_arg_max_size < 0 payload_template ( * truncated_names ). tap do | hash | size = hash . to_json . bytesize size <= MAX_APS_BYTES or raise ( InvalidPayloadGenerated . new ( "Payload size was: #{ size } " ) end end private def payload_template ( u1 , u2 , g ) { aps: { alert: " #{ u1 } and #{ u2 } invite you to game #{ g } " , }, path: "appnameios://games/ #{ game_id } " , } end MAX_APS_BYTES = 256 def payload_arg_max_size MAX_APS_BYTES - payload_without_args_size end def payload_without_args_size payload_template ( "" , "" , "" ). to_json . bytesize end def truncated_names TruncateMultipleStrings . new ( [ user1 , user2 , game_name ], payload_arg_max_size ). call end end notif = GameInvited . new ( "User1 " * 100 , "User2 " * 100 , "Game " * 100 , 123457890123 ) notif . payload # => {:aps=>{:alert=>"User1 User1 User1 User1 User1 User1 # User1 User1 User1 Use and User2 User2 User2 User2 User2 # User2 User2 User2 User2 Use invite you to game Game Game # Game Game Game Game Game Game Game Game Game G"}, # :path=>"appnameios://games/123457890123"}

Urban Airship

Remember that if you are using Urban Airship you should be in total using even less than 256 bytes so they can provide you with tracking ability.

Quote from their documentation

The maximum message size is 256 bytes. This includes the alert, badge, sound, and any extra key/value pairs in the notification section of the payload. We also recommend leaving as much extra space as possible if you are using our reporting tools, as a portion will be used to help with response tracking if it is available.

Unfortunately I couldn’t find out exactly how many bytes they need for this functionality to work properly. If any of you have the knowledge, please let me know.

Storing notification templates on the phone

If your messages are particularly long (at least in some locales) you can spare some bytes by storing the template in the app and sending only the data.

Quote from APNS documentation

You can display localized alert messages in two ways. The server originating the notification can localize the text; to do this, it must discover the current language preference selected for the device (see “Passing the Provider the Current Language Preference (Remote Notifications)”). Or the client application can store in its bundle the alert-message strings translated for each localization it supports. The provider specifies the loc-key and loc-args properties in the aps dictionary of the notification payload. When the device receives the notification (assuming the application isn’t running), it uses these aps-dictionary properties to find and format the string localized for the current language, which it then displays to the user.

Resources