Tag Archives: TPDU

SMS with Alphanumeric Source

Sending SMS with an alphanumeric String as the Source

If you’ve ever received an SMS from your operator, and the sender was the Operator name for example, you may be left wondering how it’s done.

In IMS you’d think this could be quite simple – You’d set the From header to be the name rather than the MSISDN, but for most SMSoIP deployments, the From header is ignored and instead the c header inside the SMS body is used.

So how do we get it to show text?

Well the TP-Originating address has the “Type of Number” (ToN) field which is typically set to International/National, but value 5 allows for the Digits to instead be alphanumeric characters.

GSM 7 bit encoding on the text in the TP-Originating Address digits and presto, you can send SMS to subscribers where the message shows as From an alphanumeric source.

On Android SMSs received from alphanumeric sources cannot be responded to (“no more “DO NOT REPLY TO THIS MESSAGE” at the end of each text), but on iOS devices you can respond, but if I send an SMS from “Nick” the reply from the subscriber using the iPhone will be sent to MSISDN 6425 (Nick on the telephone keypad).

The Surprisingly Complicated World of SMS: Concatenated / Multipart SMS

Most people think of 160 characters as the length of an SMS. But the payload is actually 140 bytes, but with better encoding 1 character doesn’t require 1 byte.

The above paragraph is exactly 160 characters. It would fit into a standard SMS.

By using the GSM 7 bit alphabet, you can cram 16 characters into 140 bytes (octets) of space, which is kind of cool.

140 bytes of data containing 160 characters of text

But people often need to convey more text than just 160 characters, or if you’re using characters that don’t exist in the GSM 7-bit alphabet, that limit becomes even less than 160 characters (different encodings other than GSM-7 need more data to transfer the same number of characters) so we get into multipart SMS, another feature in the surprisingly complicated world of SMS.

You’d think if you took a 160 character SMS, and concatenated it onto another 160 character SMS, you’d get a total of 320 characters, right (160+160=320)?
Alas it’s not that simple.

In order to achieve the concatenation of messages in a way that’s transparent to the users (rather than a series of SMSes coming through one-after-the-ther) a User-Data Header (TP-User-Data-Header-Indicator aka TP-UHDI) is added to the TP-User Data of the TPDU (the part that actually contains the user message).

This User-Data Header takes up 7 bytes, which with GSM encoding robs us of 6 characters from the message length. (Not a typo, GSM7 encoding does not mean 1 character = 1 byte, hence we can get 160 characters into 140 bytes of space)
So a two SMS concatenated message would only allow 268 characters to be sent (134 characters + 134 characters).

Let’s take a look at this header that’s robbing us of message length, but enabling us to concatenate messages.

For starters, the information about how many parts in the concatenated message, and what part number this one is, is located in the message body, hence robbing us of characters.

But we only know about the presence of this header being in the message body because the SMS-SUBMIT TPDU has the TP-UDHI flag (TP-User-Data-Header-Indicator) set, so we know the User Data is prefixed with the User-Data-Header.

Now if we have a look in the TP-User-Data we can see the User-Data Header, this can actually carry a few different payloads, but in our case, it’s carrying the Concatenated Short Messages IE, which tells us the message identifier (unique per single-but-multi-part message, the number of parts in the message (in this case 2) and the part number this is (part 1 of 2).

First part of a two part SMS

Now the phone has indicated this is a multipart message, the length of the data is still 160, but the length of the actual message is now limited to 134 characters with GSM7 encoding.

The encoding isn’t as bad as you might expect:
1st byte indicates the total length of the User Data Headers (After this the actual user data begins),
2nd byte is the IE identifier, for Concatenated Short Messages, this is 00,
3rd byte is the length of the Concatenated Short Messages IE,
4th byte is the message identifier in hex,
5th byte is the number of message parts in hex (So up to 255 message parts)
6th byte is the message part number, to aid in putting it back together in order.

3GPP TS 23.040 – 9.2.3.24 TP-User Data (TP-UD) – Encoding of User Data Header and generic IE
Concatenated short message IE encoding

So what we end up with is a header inside our user payload, advising that this is a concatenated SMS, the message identifier, the number of parts in the message, and the part number of this particular message.

Last part of two part SMS

The SMSc on receipt of these has to spool them back out to the destination with the same message part number, and same headers in place.

The phone receiving the SMS has to wait for all the parts to come through and then reassemble before rendering to the user.

So that’s how concatenated SMS works. While this may seem convoluted and silly in a world where transfering more than 140 bytes of data is trivial, SMS was introduced in the early 1990s, and in theory at least, a user with a phone that supported SMS purchased when SMS was introduced, should still be able to interwork with phones today.