Figuring out maximum character counts for standard SMS messages is really quite simple. However, the maximum character counts for concatenated SMS messages is a bit more complicated. Throw character encodings into the mix, and everything can become very muddled.

Encodings

Languages which use a Latin-based alphabet (such as English, Spanish, French, etc.) usually use phones supporting the GSM character encoding . The GSM character encoding uses 7 bits to represent each character (similar to ASCII). This contrasts with non-Latin-based alphabet languages (such as Chinese, Arabic, Sinhala, Mongolian, etc.) which usually use phones supporting Unicode. The specific character encoding utilized by these phones is usually UTF-16 or UCS-2. Both UTF-16 and UCS-2 use 16 bits to represent each character. For the sake of simplicity, I will refer to the Latin-based alphabet and non-Latin-based alphabet languages in this post as “GSM” and “Unicode” languages respectively.

Standard SMS Messages

Standard SMS messages have a maximum payload of 140 bytes (1120 bits).

Since GSM phones use a 7-bit character encoding, this allows a maximum of 160 characters per standard SMS message:

1120 bits / (7 bits/character) = 160 characters

For Unicode phones, which use a 16-bit character encoding, this allows a maximum of 70 characters per standard SMS message:

1120 bits / (16 bits/character) = 70 characters

Concatenated SMS Messages

Things get a little bit more complex with concatenated SMS messages. Concatenated SMS messages allow a phone to send messages longer than 160 GSM characters. The sender creates their message as normal, but without the 140 byte limit. Behind the scenes, the phone detects the message length. If the message is less than or equal to 140 bytes, the phone sends a standard SMS message. However, if the message is greater than 140 bytes characters, the phone automatically divides the longer message into multiple, shorter SMS messages which are then transmitted to the recipient separately.

The recipient’s phone takes these multiple, shorter SMS messages and recombines them into the original message which was sent. Because the individual segments of the complete message need to be recombined in this way, this is referred to as ‘concatenated SMS’. In order to achieve this seamless delivery, additional information is added to each individual concatenated SMS message. This additional information, referred to as the user data header (UDH), provides identification and ordering information. For example, the UDH could relate the three individual concatenated SMS messages to each other, and indicate the order for recombination.

The UDH takes up 6 bytes (48 bits) of a normal SMS message payload. This reduces the space for actual message data in concatenated SMS messages:

1120 bits - 48 bits = 1072 bits

As a result, each individual concatenated SMS message can only contain 1072 bits of message data. This plays an important role in determining how many individual concatenated SMS messages will be sent based on the actual message data length.

Because GSM phones use a 7-bit character encoding, each individual concatenated SMS message can hold 153 characters:

1072 bits / (7 bits/character) = 153 characters

(Note: 153 characters * 7 bits/character = 1071 bits. However, the extra bit can’t be used to represent a full character, so it is added as added as padding so that the actual 7-bit encoding data begins on a septet boundary—the 50th bit.)

Unicode phones use a 16-bit character encoding, so each individual concatenated SMS message can hold 67 characters:

1072 bits / (16 bits/character) = 67 characters

Character Count Thresholds

The character limits for individual concatenated SMS messages results in various thresholds for which additional individual concatenated SMS messages will be required to support sending a larger overall message:

GSM encoding:

1 standard SMS message = up to 160 characters

message = up to 160 characters 2 concatenated SMS messages = up to 306 characters

messages = up to 306 characters 3 concatenated SMS messages = up to 459 characters

messages = up to 459 characters 4 concatenated SMS messages = up to 612 characters

messages = up to 612 characters 5 concatenated SMS messages = up to 765 characters

messages = up to 765 characters etc. (153 x number of individual concatenated SMS messages)

UTF-16 encoding:

1 standard SMS message = up to 70 characters

message = up to 70 characters 2 concatenated SMS messages = up to 134 characters

messages = up to 134 characters 3 concatenated SMS messages = up to 201 characters

messages = up to 201 characters 4 concatenated SMS messages = up to 268 characters

messages = up to 268 characters 5 concatenated SMS messages = up to 335 characters

messages = up to 335 characters etc. (67 x number of individual concatenated SMS messages)

Implications

These thresholds are an important consideration for a number of reasons including billing, and the programmatic interfacing with SMS gateways.

Generally, telephone companies count individual concatenated SMS messages separately even though they are being recombined at the phone into a single message. This means a GSM encoded message containing 180 characters could potentially invoke a charge for two SMS messages, even if the sender/recipient only sees a single message.

When interfacing with a telephone company’s SMS gateway programmatically, there may be limits on the number of individual concatenated SMS messages which can sent as part of a single message. For example, Clickatell’s documentation states that messages sent through their API should not contain more than 3 concatenated SMS segments. This may require limiting the number of character input in a web application or service which sends SMS messages via an API in such a manner.

While it may seem elementary, it is important to point out that SMS messages are always in one particular encoding; i.e. fully GSM or fully UTF-16. For example, a period character (”.”) takes up 7-bits in a GSM SMS message. The same character may exist in a Unicode SMS message, but takes up 16-bits, even it is representing the same character.