Most people think of 160 characters as the length of an SMS. But the payload is actually 140 bytes, but with better encoding 1 character doesn’t require 1 byte.The above paragraph is exactly 160 characters. It would fit into a standard SMS.
By using the GSM 7 bit alphabet, you can cram 16 characters into 140 bytes (octets) of space, which is kind of cool.
But people often need to convey more text than just 160 characters, or if you’re using characters that don’t exist in the GSM 7-bit alphabet, that limit becomes even less than 160 characters (different encodings other than GSM-7 need more data to transfer the same number of characters) so we get into multipart SMS, another feature in the surprisingly complicated world of SMS.
You’d think if you took a 160 character SMS, and concatenated it onto another 160 character SMS, you’d get a total of 320 characters, right (160+160=320)?
Alas it’s not that simple.
In order to achieve the concatenation of messages in a way that’s transparent to the users (rather than a series of SMSes coming through one-after-the-ther) a User-Data Header (TP-User-Data-Header-Indicator aka TP-UHDI) is added to the TP-User Data of the TPDU (the part that actually contains the user message).
This User-Data Header takes up 7 bytes, which with GSM encoding robs us of 6 characters from the message length. (Not a typo, GSM7 encoding does not mean 1 character = 1 byte, hence we can get 160 characters into 140 bytes of space)
So a two SMS concatenated message would only allow 268 characters to be sent (134 characters + 134 characters).
Let’s take a look at this header that’s robbing us of message length, but enabling us to concatenate messages.
For starters, the information about how many parts in the concatenated message, and what part number this one is, is located in the message body, hence robbing us of characters.
But we only know about the presence of this header being in the message body because the SMS-SUBMIT TPDU has the TP-UDHI flag (TP-User-Data-Header-Indicator) set, so we know the User Data is prefixed with the User-Data-Header.
Now if we have a look in the TP-User-Data we can see the User-Data Header, this can actually carry a few different payloads, but in our case, it’s carrying the Concatenated Short Messages IE, which tells us the message identifier (unique per single-but-multi-part message, the number of parts in the message (in this case 2) and the part number this is (part 1 of 2).
Now the phone has indicated this is a multipart message, the length of the data is still 160, but the length of the actual message is now limited to 134 characters with GSM7 encoding.
The encoding isn’t as bad as you might expect:
1st byte indicates the total length of the User Data Headers (After this the actual user data begins),
2nd byte is the IE identifier, for Concatenated Short Messages, this is 00,
3rd byte is the length of the Concatenated Short Messages IE,
4th byte is the message identifier in hex,
5th byte is the number of message parts in hex (So up to 255 message parts)
6th byte is the message part number, to aid in putting it back together in order.
So what we end up with is a header inside our user payload, advising that this is a concatenated SMS, the message identifier, the number of parts in the message, and the part number of this particular message.
The SMSc on receipt of these has to spool them back out to the destination with the same message part number, and same headers in place.
The phone receiving the SMS has to wait for all the parts to come through and then reassemble before rendering to the user.
So that’s how concatenated SMS works. While this may seem convoluted and silly in a world where transfering more than 140 bytes of data is trivial, SMS was introduced in the early 1990s, and in theory at least, a user with a phone that supported SMS purchased when SMS was introduced, should still be able to interwork with phones today.