Skip to main content
SMS messages are encoded into segments of 140 bytes each. You are billed per segment, so understanding encoding is key to controlling costs. The encoding determines how many characters fit in each segment:
EncodingBits per charSingle segmentMulti-part segment
GSM 7-bit7160 chars153 chars
ASCII 7-bit7160 chars153 chars
ASCII 8-bit8140 chars134 chars
UTF-161670 chars67 chars
A single non-GSM-7 character (like an emoji or curly quote) switches the entire message to UTF-16, cutting capacity from 160 to 70 characters per segment. This can more than double your costs.

Segment calculator

Use this interactive tool to check how your message will be encoded and segmented:

How segments work

Every SMS message is transmitted in units of 140 bytes. When a message exceeds one segment, a 6-byte header (User Data Header, or UDH) is added to each segment for reassembly, reducing the usable space.
Single segment:   140 bytes available → 160 GSM-7 chars or 70 UTF-16 chars
Multi-part:       134 bytes per segment → 153 GSM-7 chars or 67 UTF-16 chars
Maximum:          10 segments per message

Segment calculation formula

To calculate the number of segments for a message:
Characters ≤ 160  →  1 segment
Characters > 160  →  ⌈characters / 153⌉ segments

Examples:
- 100 chars = 1 segment
- 160 chars = 1 segment
- 161 chars = 2 segments (153 + 8)
- 306 chars = 2 segments (153 + 153)
- 307 chars = 3 segments (153 + 153 + 1)
- 1530 chars = 10 segments (maximum)

Cost impact example

Consider a 200-character message:
ScenarioEncodingSegmentsRelative cost
All GSM-7 charactersGSM-72
Contains one emoji 😀UTF-163
Contains one curly quote “UTF-163
With smart encoding enabledGSM-72
Enable smart encoding to automatically replace common Unicode characters (like curly quotes and em dashes) with GSM-7 equivalents, reducing segment counts.

Encoding by sender type

Sender typeDefault encodingFallback
Long CodeGSM 7-bitUTF-16
Toll-FreeGSM 7-bitUTF-16
Short CodeASCII 7-bitUTF-16
AlphanumericGSM 7-bitUTF-16
If your message contains characters outside the default encoding’s character set, the fallback encoding is used automatically for the entire message.
MMS and RCS messages use UTF-8 encoding by default and are not affected by these limits.

GSM 7-bit character set

Telnyx uses a GSM 7-bit encoding optimized for maximum carrier compatibility. Only characters in this set will keep your message in the efficient GSM-7 encoding.
Letters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
Digits:
0 1 2 3 4 5 6 7 8 9
Symbols and punctuation:
! " # $ % & ' ( ) * + , - . / : ; < = > ? @
Special characters:
CharacterDescription
spaceSpace
\nLine feed
\rCarriage return
_Underscore
£Pound sign
¥Yen sign
èe grave
ée acute
ùu grave
ìi grave
òo grave
ØO with stroke
øo with stroke
ÅA with ring
åa with ring
ÆAE ligature
æae ligature
ßSharp s
ÉE acute
¡Inverted exclamation
ÄA umlaut
ÖO umlaut
ÑN tilde
ÜU umlaut
§Section sign
¿Inverted question
äa umlaut
öo umlaut
ñn tilde
üu umlaut
àa grave
These characters require an escape sequence and count as 2 characters in segment calculations:
CharacterDescriptionCharacter count
~Tilde2
^Circumflex2
|Pipe / vertical bar2
\Backslash2
{Left curly bracket2
}Right curly bracket2
[Left square bracket2
]Right square bracket2
Euro sign2
Extended characters are easy to overlook when estimating segment counts. A message with 155 standard characters and 3 pipe characters (|) uses 155 + (3 × 2) = 161 character slots, requiring 2 segments instead of 1.

Detecting encoding in your application

Before sending, you can check if a message will use GSM-7 or UTF-16 encoding to estimate costs. Here are helper functions for each language:
import re

# GSM-7 basic character set
GSM7_BASIC = set(
    "@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ ÆæßÉ"
    " !\"#¤%&'()*+,-./0123456789:;<=>?"
    "¡ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    "ÄÖÑܧ¿abcdefghijklmnopqrstuvwxyz"
    "äöñüà"
)
GSM7_EXTENDED = set("^{}\\[~]|€")

def calculate_segments(text: str) -> dict:
    """Calculate encoding and segment count for an SMS message."""
    is_gsm7 = all(c in GSM7_BASIC or c in GSM7_EXTENDED for c in text)

    if is_gsm7:
        # Count extended chars as 2
        char_count = sum(2 if c in GSM7_EXTENDED else 1 for c in text)
        if char_count <= 160:
            segments = 1
        else:
            segments = -(-char_count // 153)  # ceiling division
        return {"encoding": "GSM-7", "char_count": char_count, "segments": segments}
    else:
        # UTF-16: emojis count as 2 chars (surrogate pairs)
        char_count = 0
        for c in text:
            char_count += 2 if ord(c) > 0xFFFF else 1
        if char_count <= 70:
            segments = 1
        else:
            segments = -(-char_count // 67)
        return {"encoding": "UTF-16", "char_count": char_count, "segments": segments}

# Example usage
result = calculate_segments("Hello, world!")
print(f"Encoding: {result['encoding']}, Segments: {result['segments']}")
# Output: Encoding: GSM-7, Segments: 1

result = calculate_segments("Hello 😀")
print(f"Encoding: {result['encoding']}, Segments: {result['segments']}")
# Output: Encoding: UTF-16, Segments: 1

Common encoding issues

Symptom: Your message uses more segments than expected.Cause: A non-GSM-7 character is present, forcing the entire message to UTF-16. Common culprits:
CharacterSourceGSM-7?
" " (curly quotes)Word processors, mobile keyboards
' ' (curly apostrophes)Auto-correct, CMS platforms
(em dash)Word processors
(ellipsis)Mobile keyboards
(euro sign)Manual entry✅ (extended, costs 2 chars)
Fix:
  1. Enable smart encoding to auto-replace these characters
  2. Or manually replace them with GSM-7 equivalents before sending
Symptom: Adding a single emoji doubles or triples the number of segments.Cause: Emojis force UTF-16 encoding (70 chars/segment instead of 160). Additionally, most emojis use surrogate pairs and count as 2 UTF-16 characters.Example:
"Thanks for your order!"        → GSM-7, 1 segment (22 chars)
"Thanks for your order! 🎉"     → UTF-16, 1 segment (25 chars)
"Thanks for your order! ... 🎉" → UTF-16, 2 segments (71+ chars)
Fix: If cost is a concern, avoid emojis in SMS. Use emojis freely in MMS/RCS where encoding isn’t a factor.
Symptom: A 155-character message that looks like it should fit in one segment actually requires two.Cause: Characters like [, ], {, }, |, \, ^, ~, and are in the GSM-7 extended set and count as 2 characters each.Example:
"Price: $100 [USD]" → 18 visible chars but 20 GSM-7 chars ([ and ] each cost 2)
Fix: Account for extended characters when calculating message length. Use the segment calculator above or the SDK helpers in this guide.
Symptom: Text that looks like normal ASCII actually contains Unicode characters.Cause: Word processors often replace straight quotes with curly quotes, hyphens with em dashes, and three periods with an ellipsis character. These are invisible differences that force UTF-16.Fix:
  1. Enable smart encoding — this handles the most common substitutions automatically
  2. Sanitize text before sending by replacing known problem characters
  3. Use the encoding parameter set to gsm7 to get a 400 error if non-GSM-7 characters are present (fail-fast approach)
Symptom: The recipient sees a message split in unexpected places, or parts arrive out of order.Cause: Multi-part messages are reassembled by the recipient’s device using the UDH (User Data Header). Some older devices or carriers may not support reassembly for messages over a certain number of segments.Fix:
  • Keep messages under 3-4 segments for maximum compatibility
  • Telnyx supports up to 10 segments, but recipient device support varies
  • Consider using MMS for longer content
Symptom: Messages in non-Latin scripts use significantly more segments than English messages of similar visible length.Cause: Non-Latin characters have no GSM-7 equivalents, so the entire message uses UTF-16 encoding (70 characters per segment). Smart encoding cannot help here.Fix:
  • This is expected behavior — plan for higher segment counts when messaging in non-Latin scripts
  • Keep messages concise
  • Consider MMS for longer non-Latin content

Best practices

1

Enable smart encoding

Turn on smart encoding on your messaging profile to automatically handle Unicode-to-GSM-7 substitutions. This is the single biggest cost-saving measure.
2

Validate before sending

Use the encoding detection helpers above to check segment counts before sending. Alert your application when messages will be unexpectedly expensive.
3

Sanitize input text

If you accept user-generated content, sanitize it before sending. Strip or replace invisible Unicode characters, curly quotes, and other common problem characters.
4

Keep messages concise

Stay under 160 characters (GSM-7) or 70 characters (UTF-16) to avoid multi-part message overhead. Each additional segment adds 7 characters of UDH overhead.
5

Use the right channel

For messages that need emojis, rich formatting, or non-Latin scripts, consider MMS or RCS instead of SMS.