| Encoding | Bits per char | Single segment | Multi-part segment |
|---|---|---|---|
| GSM 7-bit | 7 | 160 chars | 153 chars |
| ASCII 7-bit | 7 | 160 chars | 153 chars |
| ASCII 8-bit | 8 | 140 chars | 134 chars |
| UTF-16 | 16 | 70 chars | 67 chars |
Segment calculator
Use this interactive tool to check how your message will be encoded and segmented:How segments work
Every SMS message is transmitted in units of 140 bytes. When a message exceeds one segment, a 6-byte header (User Data Header, or UDH) is added to each segment for reassembly, reducing the usable space.Segment calculation formula
To calculate the number of segments for a message:- GSM-7
- UTF-16
Cost impact example
Consider a 200-character message:| Scenario | Encoding | Segments | Relative cost |
|---|---|---|---|
| All GSM-7 characters | GSM-7 | 2 | 2× |
| Contains one emoji 😀 | UTF-16 | 3 | 3× |
| Contains one curly quote “ | UTF-16 | 3 | 3× |
| With smart encoding enabled | GSM-7 | 2 | 2× |
Enable smart encoding to automatically replace common Unicode characters (like curly quotes and em dashes) with GSM-7 equivalents, reducing segment counts.
Encoding by sender type
| Sender type | Default encoding | Fallback |
|---|---|---|
| Long Code | GSM 7-bit | UTF-16 |
| Toll-Free | GSM 7-bit | UTF-16 |
| Short Code | ASCII 7-bit | UTF-16 |
| Alphanumeric | GSM 7-bit | UTF-16 |
MMS and RCS messages use UTF-8 encoding by default and are not affected by these limits.
GSM 7-bit character set
Telnyx uses a GSM 7-bit encoding optimized for maximum carrier compatibility. Only characters in this set will keep your message in the efficient GSM-7 encoding.Standard characters (1 character each)
Standard characters (1 character each)
Letters:Digits:Symbols and punctuation:Special characters:
| Character | Description |
|---|---|
space | Space |
\n | Line feed |
\r | Carriage return |
_ | Underscore |
£ | Pound sign |
¥ | Yen sign |
è | e grave |
é | e acute |
ù | u grave |
ì | i grave |
ò | o grave |
Ø | O with stroke |
ø | o with stroke |
Å | A with ring |
å | a with ring |
Æ | AE ligature |
æ | ae ligature |
ß | Sharp s |
É | E acute |
¡ | Inverted exclamation |
Ä | A umlaut |
Ö | O umlaut |
Ñ | N tilde |
Ü | U umlaut |
§ | Section sign |
¿ | Inverted question |
ä | a umlaut |
ö | o umlaut |
ñ | n tilde |
ü | u umlaut |
à | a grave |
Extended characters (2 characters each)
Extended characters (2 characters each)
These characters require an escape sequence and count as 2 characters in segment calculations:
| Character | Description | Character count |
|---|---|---|
~ | Tilde | 2 |
^ | Circumflex | 2 |
| | Pipe / vertical bar | 2 |
\ | Backslash | 2 |
{ | Left curly bracket | 2 |
} | Right curly bracket | 2 |
[ | Left square bracket | 2 |
] | Right square bracket | 2 |
€ | Euro sign | 2 |
Detecting encoding in your application
Before sending, you can check if a message will use GSM-7 or UTF-16 encoding to estimate costs. Here are helper functions for each language:Common encoding issues
Message unexpectedly uses UTF-16 (too many segments)
Message unexpectedly uses UTF-16 (too many segments)
Symptom: Your message uses more segments than expected.Cause: A non-GSM-7 character is present, forcing the entire message to UTF-16. Common culprits:
Fix:
| Character | Source | GSM-7? |
|---|---|---|
" " (curly quotes) | Word processors, mobile keyboards | ❌ |
' ' (curly apostrophes) | Auto-correct, CMS platforms | ❌ |
— (em dash) | Word processors | ❌ |
… (ellipsis) | Mobile keyboards | ❌ |
€ (euro sign) | Manual entry | ✅ (extended, costs 2 chars) |
- Enable smart encoding to auto-replace these characters
- Or manually replace them with GSM-7 equivalents before sending
Emojis dramatically increase segment count
Emojis dramatically increase segment count
Symptom: Adding a single emoji doubles or triples the number of segments.Cause: Emojis force UTF-16 encoding (70 chars/segment instead of 160). Additionally, most emojis use surrogate pairs and count as 2 UTF-16 characters.Example:Fix: If cost is a concern, avoid emojis in SMS. Use emojis freely in MMS/RCS where encoding isn’t a factor.
Extended GSM-7 characters cause unexpected segment splits
Extended GSM-7 characters cause unexpected segment splits
Symptom: A 155-character message that looks like it should fit in one segment actually requires two.Cause: Characters like Fix: Account for extended characters when calculating message length. Use the segment calculator above or the SDK helpers in this guide.
[, ], {, }, |, \, ^, ~, and € are in the GSM-7 extended set and count as 2 characters each.Example:Copy-pasted text from Word/Google Docs causes issues
Copy-pasted text from Word/Google Docs causes issues
Symptom: Text that looks like normal ASCII actually contains Unicode characters.Cause: Word processors often replace straight quotes with curly quotes, hyphens with em dashes, and three periods with an ellipsis character. These are invisible differences that force UTF-16.Fix:
- Enable smart encoding — this handles the most common substitutions automatically
- Sanitize text before sending by replacing known problem characters
- Use the
encodingparameter set togsm7to get a400error if non-GSM-7 characters are present (fail-fast approach)
Messages truncated or split incorrectly on recipient's phone
Messages truncated or split incorrectly on recipient's phone
Symptom: The recipient sees a message split in unexpected places, or parts arrive out of order.Cause: Multi-part messages are reassembled by the recipient’s device using the UDH (User Data Header). Some older devices or carriers may not support reassembly for messages over a certain number of segments.Fix:
- Keep messages under 3-4 segments for maximum compatibility
- Telnyx supports up to 10 segments, but recipient device support varies
- Consider using MMS for longer content
Non-Latin scripts (Chinese, Arabic, Cyrillic) use too many segments
Non-Latin scripts (Chinese, Arabic, Cyrillic) use too many segments
Symptom: Messages in non-Latin scripts use significantly more segments than English messages of similar visible length.Cause: Non-Latin characters have no GSM-7 equivalents, so the entire message uses UTF-16 encoding (70 characters per segment). Smart encoding cannot help here.Fix:
- This is expected behavior — plan for higher segment counts when messaging in non-Latin scripts
- Keep messages concise
- Consider MMS for longer non-Latin content
Best practices
Enable smart encoding
Turn on smart encoding on your messaging profile to automatically handle Unicode-to-GSM-7 substitutions. This is the single biggest cost-saving measure.
Validate before sending
Use the encoding detection helpers above to check segment counts before sending. Alert your application when messages will be unexpectedly expensive.
Sanitize input text
If you accept user-generated content, sanitize it before sending. Strip or replace invisible Unicode characters, curly quotes, and other common problem characters.
Keep messages concise
Stay under 160 characters (GSM-7) or 70 characters (UTF-16) to avoid multi-part message overhead. Each additional segment adds 7 characters of UDH overhead.
Related resources
Smart Encoding
Automatically replace Unicode characters with GSM-7 equivalents to reduce costs.
Send Your First Message
Get started with the Telnyx Messaging API.
Messages API Reference
API reference for sending messages with encoding options.
Messaging Profiles
Configure smart encoding and other profile settings.