Smart encoding automatically replaces Unicode characters with visually similar GSM-7 characters. This keeps your messages in the more efficient GSM-7 encoding, reducing segment counts and costs.
Why use smart encoding
SMS messages using GSM-7 encoding fit 160 characters per segment. When a message contains even one Unicode character outside GSM-7, the entire message switches to UTF-16 encoding, which only fits 70 characters per segment.
A single smart quote (") or em dash (—) can double your messaging costs.
Example:
| Message | Encoding | Segments | Cost impact |
|---|
Hello, how are you? (150 chars) | GSM-7 | 1 | Base cost |
Hello, how are you? (150 chars with smart quotes) | UTF-16 | 3 | 3x cost |
Hello, how are you? (150 chars, smart encoding ON) | GSM-7 | 1 | Base cost |
How it works
When smart encoding is enabled:
- Your message text is scanned for Unicode characters.
- Characters with GSM-7 equivalents are automatically replaced.
- The final encoding type (GSM-7 or UTF-16) is determined after transformation.
- Segment count is recalculated based on the transformed message.
- The API response includes smart encoding metadata.
When you send a message with smart encoding enabled, the API response includes metadata about the transformation:
{
"data": {
"id": "...",
"encoding": "GSM-7",
"parts": 1,
"smart_encoding": {
"smart_encoding_applied": true,
"final_encoding": "gsm7",
"segment_count": 1,
"character_count": 155,
"replaced_character_count": 3,
"length_change": 2
}
}
}
| Field | Description |
|---|
smart_encoding_applied | Whether any characters were replaced. |
final_encoding | The encoding used after transformation (gsm7 or ucs2). |
segment_count | Number of segments after smart encoding. |
character_count | Message length after transformation. |
replaced_character_count | Number of unique characters that were substituted. |
length_change | Difference in length (positive if message grew, e.g., … → ...). |
The parts field in the API response reflects the segment count after smart encoding is applied, so you see the actual billing impact.
Enable smart encoding
Enable smart encoding on your messaging profile via the API or portal.
curl -X PATCH https://api.telnyx.com/v2/messaging_profiles/{id} \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"smart_encoding": true
}'
- Navigate to Messaging > Messaging Profiles.
- Select your messaging profile.
- Enable Smart Encoding.
- Click Save.
Character substitutions
Smart encoding replaces 200+ Unicode characters with GSM-7 equivalents. The tables below show all supported substitutions grouped by category.
Quotation marks
| Unicode | Glyph | Description | Replacement |
|---|
| U+00AB | « | Left-pointing double angle quotation mark | ” |
| U+00BB | » | Right-pointing double angle quotation mark | ” |
| U+201C | ” | Left double quotation mark | ” |
| U+201D | ” | Right double quotation mark | ” |
| U+02BA | ʺ | Modifier letter double prime | ” |
| U+02EE | ˮ | Modifier letter double apostrophe | ” |
| U+201F | ‟ | Double high-reversed-9 quotation mark | ” |
| U+275D | ❝ | Heavy double turned comma quotation mark ornament | ” |
| U+275E | ❞ | Heavy double comma quotation mark ornament | ” |
| U+301D | 〝 | Reversed double prime quotation mark | ” |
| U+301E | 〞 | Double prime quotation mark | ” |
| U+FF02 | " | Fullwidth quotation mark | ” |
| U+201E | „ | Double low quotation mark | ” |
Apostrophes and single quotes
| Unicode | Glyph | Description | Replacement |
|---|
| U+2018 | ’ | Left single quotation mark | ’ |
| U+2019 | ’ | Right single quotation mark | ’ |
| U+02BB | ʻ | Modifier letter turned comma | ’ |
| U+02C8 | ˈ | Modifier letter vertical line | ’ |
| U+02BC | ʼ | Modifier letter apostrophe | ’ |
| U+02BD | ʽ | Modifier letter reversed comma | ’ |
| U+02B9 | ʹ | Modifier letter prime | ’ |
| U+201B | ‛ | Single high-reversed-9 quotation mark | ’ |
| U+FF07 | ' | Fullwidth apostrophe | ’ |
| U+00B4 | ´ | Acute accent | ’ |
| U+02CA | ˊ | Modifier letter acute accent | ’ |
| U+0060 | ` | Grave accent | ’ |
| U+02CB | ˋ | Modifier letter grave accent | ’ |
| U+275B | ❛ | Heavy single turned comma quotation mark ornament | ’ |
| U+275C | ❜ | Heavy single comma quotation mark ornament | ’ |
| U+0313 | ̓ | Combining comma above | ’ |
| U+0314 | ̔ | Combining reversed comma above | ’ |
| U+FE10 | ︐ | Presentation form for vertical comma | ’ |
| U+FE11 | ︑ | Presentation form for vertical ideographic comma | ’ |
Dashes and hyphens
| Unicode | Glyph | Description | Replacement |
|---|
| U+2014 | — | Em dash | - |
| U+2013 | – | En dash | - |
| U+23BC | ⎼ | Horizontal scan line-7 | - |
| U+23BD | ⎽ | Horizontal scan line-9 | - |
| U+2015 | ― | Horizontal bar | - |
| U+FE63 | ﹣ | Small hyphen-minus | - |
| U+FF0D | - | Fullwidth hyphen-minus | - |
| U+2010 | ‐ | Hyphen | - |
| U+2022 | • | Bullet | - |
| U+2043 | ⁃ | Hyphen bullet | - |
Slashes and division
| Unicode | Glyph | Description | Replacement |
|---|
| U+00F7 | ÷ | Division sign | / |
| U+00BC | ¼ | Vulgar fraction one quarter | 1/4 |
| U+00BD | ½ | Vulgar fraction one half | 1/2 |
| U+00BE | ¾ | Vulgar fraction three quarters | 3/4 |
| U+29F8 | ⧸ | Big solidus | / |
| U+0337 | ̷ | Combining short solidus overlay | / |
| U+0338 | ̸ | Combining long solidus overlay | / |
| U+2044 | ⁄ | Fraction slash | / |
| U+2215 | ∕ | Division slash | / |
| U+FF0F | / | Fullwidth solidus | / |
Backslashes
| Unicode | Glyph | Description | Replacement |
|---|
| U+29F9 | ⧹ | Big reverse solidus | \ |
| U+29F5 | ⧵ | Reverse solidus operator | \ |
| U+20E5 | | Combining reverse solidus overlay | \ |
| U+FE68 | ﹨ | Small reverse solidus | \ |
| U+FF3C | \ | Fullwidth reverse solidus | \ |
Underscores
| Unicode | Glyph | Description | Replacement |
|---|
| U+0332 | ̲ | Combining low line | _ |
| U+FF3F | _ | Fullwidth low line | _ |
| U+2017 | ‗ | Double low line | _ |
Vertical lines
| Unicode | Glyph | Description | Replacement |
|---|
| U+20D2 | ⃒ | Combining long vertical line overlay | | |
| U+20D3 | ⃓ | Combining short vertical line overlay | | |
| U+2223 | ∣ | Divides | | |
| U+FF5C | | | Fullwidth vertical line | | |
| U+23B8 | ⎸ | Left vertical box line | | |
| U+23B9 | ⎹ | Right vertical box line | | |
| U+23D0 | ⏐ | Vertical line extension | | |
| U+239C | ⎜ | Left parenthesis extension | | |
| U+239F | ⎟ | Right parenthesis extension | | |
Symbols and punctuation
| Unicode | Glyph | Description | Replacement |
|---|
| U+FE6B | ﹫ | Small commercial at sign | @ |
| U+FF20 | @ | Fullwidth commercial at sign | @ |
| U+FE69 | ﹩ | Small dollar sign | $ |
| U+FF04 | $ | Fullwidth dollar sign | $ |
| U+01C3 | ǃ | Latin letter retroflex click | ! |
| U+FE15 | ︕ | Presentation form for vertical exclamation mark | ! |
| U+FE57 | ﹗ | Small exclamation mark | ! |
| U+FF01 | ! | Fullwidth exclamation mark | ! |
| U+203C | ‼ | Double exclamation mark | !! |
| U+FE5F | ﹟ | Small number sign | # |
| U+FF03 | # | Fullwidth number sign | # |
| U+FE6A | ﹪ | Small percent sign | % |
| U+FF05 | % | Fullwidth percent sign | % |
| U+FE60 | ﹠ | Small ampersand | & |
| U+FF06 | & | Fullwidth ampersand | & |
| U+2026 | … | Horizontal ellipsis | … |
Commas
| Unicode | Glyph | Description | Replacement |
|---|
| U+201A | ‚ | Single low-9 quotation mark | , |
| U+0326 | ̦ | Combining comma below | , |
| U+FE50 | ﹐ | Small comma | , |
| U+3001 | 、 | Ideographic comma | , |
| U+FE51 | ﹑ | Small ideographic comma | , |
| U+FF0C | , | Fullwidth comma | , |
| U+FF64 | 、 | Halfwidth ideographic comma | , |
Parentheses
| Unicode | Glyph | Description | Replacement |
|---|
| U+2768 | ❨ | Medium left parenthesis ornament | ( |
| U+276A | ❪ | Medium flattened left parenthesis ornament | ( |
| U+FE59 | ﹙ | Small left parenthesis | ( |
| U+FF08 | ( | Fullwidth left parenthesis | ( |
| U+27EE | ⟮ | Mathematical left flattened parenthesis | ( |
| U+2985 | ⦅ | Left white parenthesis | ( |
| U+2769 | ❩ | Medium right parenthesis ornament | ) |
| U+276B | ❫ | Medium flattened right parenthesis ornament | ) |
| U+FE5A | ﹚ | Small right parenthesis | ) |
| U+FF09 | ) | Fullwidth right parenthesis | ) |
| U+27EF | ⟯ | Mathematical right flattened parenthesis | ) |
| U+2986 | ⦆ | Right white parenthesis | ) |
Brackets
| Unicode | Glyph | Description | Replacement |
|---|
| U+2774 | ❴ | Medium left curly bracket ornament | { |
| U+FE5B | ﹛ | Small left curly bracket | { |
| U+FF5B | { | Fullwidth left curly bracket | { |
| U+2775 | ❵ | Medium right curly bracket ornament | } |
| U+FE5C | ﹜ | Small right curly bracket | } |
| U+FF5D | } | Fullwidth right curly bracket | } |
| U+FF3B | [ | Fullwidth left square bracket | [ |
| U+FF3D | ] | Fullwidth right square bracket | ] |
Asterisks
| Unicode | Glyph | Description | Replacement |
|---|
| U+204E | ⁎ | Low asterisk | * |
| U+2217 | ∗ | Asterisk operator | * |
| U+229B | ⊛ | Circled asterisk operator | * |
| U+2722 | ✢ | Four teardrop-spoked asterisk | * |
| U+2723 | ✣ | Four balloon-spoked asterisk | * |
| U+2724 | ✤ | Heavy four balloon-spoked asterisk | * |
| U+2725 | ✥ | Four club-spoked asterisk | * |
| U+2731 | ✱ | Heavy asterisk | * |
| U+2732 | ✲ | Open center asterisk | * |
| U+2733 | ✳ | Eight spoked asterisk | * |
| U+273A | ✺ | Sixteen pointed asterisk | * |
| U+273B | ✻ | Teardrop-spoked asterisk | * |
| U+273C | ✼ | Open center teardrop-spoked asterisk | * |
| U+273D | ✽ | Heavy teardrop-spoked asterisk | * |
| U+2743 | ❃ | Heavy teardrop-spoked pinwheel asterisk | * |
| U+2749 | ❉ | Balloon-spoked asterisk | * |
| U+274A | ❊ | Eight teardrop-spoked propeller asterisk | * |
| U+274B | ❋ | Heavy eight teardrop-spoked propeller asterisk | * |
| U+29C6 | ⧆ | Squared asterisk | * |
| U+FE61 | ﹡ | Small asterisk | * |
| U+FF0A | * | Fullwidth asterisk | * |
Math and comparison
| Unicode | Glyph | Description | Replacement |
|---|
| U+02D6 | ˖ | Modifier letter plus sign | + |
| U+FE62 | ﹢ | Small plus sign | + |
| U+FF0B | + | Fullwidth plus sign | + |
| U+FE64 | ﹤ | Small less-than sign | < |
| U+FF1C | < | Fullwidth less-than sign | < |
| U+0347 | ͇ | Combining equals sign below | = |
| U+A78A | ꞊ | Modifier letter short equals sign | = |
| U+FE66 | ﹦ | Small equals sign | = |
| U+FF1D | = | Fullwidth equals sign | = |
| U+FE65 | ﹥ | Small greater-than sign | > |
| U+FF1E | > | Fullwidth greater-than sign | > |
| U+2039 | ‹ | Single left-pointing angle quotation mark | > |
| U+203A | › | Single right-pointing angle quotation mark | < |
Periods and colons
| Unicode | Glyph | Description | Replacement |
|---|
| U+3002 | 。 | Ideographic full stop | . |
| U+FE52 | ﹒ | Small full stop | . |
| U+FF0E | . | Fullwidth full stop | . |
| U+FF61 | 。 | Halfwidth ideographic full stop | . |
| U+02D0 | ː | Modifier letter triangular colon | : |
| U+02F8 | ˸ | Modifier letter raised colon | : |
| U+2982 | ⦂ | Z notation type colon | : |
| U+A789 | ꞉ | Modifier letter colon | : |
| U+FE13 | ︓ | Presentation form for vertical colon | : |
| U+FF1A | : | Fullwidth colon | : |
| U+204F | ⁏ | Reversed semicolon | ; |
| U+FE14 | ︔ | Presentation form for vertical semicolon | ; |
| U+FE54 | ﹔ | Small semicolon | ; |
| U+FF1B | ; | Fullwidth semicolon | ; |
| U+FE16 | ︖ | Presentation form for vertical question mark | ? |
| U+FE56 | ﹖ | Small question mark | ? |
| U+FF1F | ? | Fullwidth question mark | ? |
Fullwidth digits
| Unicode | Glyph | Description | Replacement |
|---|
| U+FF10 | 0 | Fullwidth digit zero | 0 |
| U+FF11 | 1 | Fullwidth digit one | 1 |
| U+FF12 | 2 | Fullwidth digit two | 2 |
| U+FF13 | 3 | Fullwidth digit three | 3 |
| U+FF14 | 4 | Fullwidth digit four | 4 |
| U+FF15 | 5 | Fullwidth digit five | 5 |
| U+FF16 | 6 | Fullwidth digit six | 6 |
| U+FF17 | 7 | Fullwidth digit seven | 7 |
| U+FF18 | 8 | Fullwidth digit eight | 8 |
| U+FF19 | 9 | Fullwidth digit nine | 9 |
Fullwidth and small capital letters
Fullwidth uppercase (U+FF21–U+FF3A):
| Unicode | Glyph | Replacement |
|---|
| U+FF21–U+FF3A | A–Z | A–Z |
Fullwidth lowercase (U+FF41–U+FF5A):
| Unicode | Glyph | Replacement |
|---|
| U+FF41–U+FF5A | a–z | a–z |
Small capital letters:
| Unicode | Glyph | Description | Replacement |
|---|
| U+1D00 | ᴀ | Latin letter small capital A | A |
| U+0299 | ʙ | Latin letter small capital B | B |
| U+1D04 | ᴄ | Latin letter small capital C | C |
| U+1D05 | ᴅ | Latin letter small capital D | D |
| U+1D07 | ᴇ | Latin letter small capital E | E |
| U+A730 | ꜰ | Latin letter small capital F | F |
| U+0262 | ɢ | Latin letter small capital G | G |
| U+029C | ʜ | Latin letter small capital H | H |
| U+026A | ɪ | Latin letter small capital I | I |
| U+1D0A | ᴊ | Latin letter small capital J | J |
| U+1D0B | ᴋ | Latin letter small capital K | K |
| U+029F | ʟ | Latin letter small capital L | L |
| U+1D0D | ᴍ | Latin letter small capital M | M |
| U+0274 | ɴ | Latin letter small capital N | N |
| U+1D0F | ᴏ | Latin letter small capital O | O |
| U+1D18 | ᴘ | Latin letter small capital P | P |
| U+0280 | ʀ | Latin letter small capital R | R |
| U+A731 | ꜱ | Latin letter small capital S | S |
| U+1D1B | ᴛ | Latin letter small capital T | T |
| U+1D1C | ᴜ | Latin letter small capital U | U |
| U+1D20 | ᴠ | Latin letter small capital V | V |
| U+1D21 | ᴡ | Latin letter small capital W | W |
| U+028F | ʏ | Latin letter small capital Y | Y |
| U+1D22 | ᴢ | Latin letter small capital Z | Z |
Greek letters
Greek capital letters that visually resemble Latin letters are substituted:
| Unicode | Glyph | Description | Replacement |
|---|
| U+0391 | Α | Greek capital letter Alpha | A |
| U+0392 | Β | Greek capital letter Beta | B |
| U+0395 | Ε | Greek capital letter Epsilon | E |
| U+0397 | Η | Greek capital letter Eta | H |
| U+0399 | Ι | Greek capital letter Iota | I |
| U+039A | Κ | Greek capital letter Kappa | K |
| U+039C | Μ | Greek capital letter Mu | M |
| U+039D | Ν | Greek capital letter Nu | N |
| U+039F | Ο | Greek capital letter Omicron | O |
| U+03A1 | Ρ | Greek capital letter Rho | P |
| U+03A4 | Τ | Greek capital letter Tau | T |
| U+03A7 | Χ | Greek capital letter Chi | X |
| U+03A5 | Υ | Greek capital letter Upsilon | Y |
| U+0396 | Ζ | Greek capital letter Zeta | Z |
Special language support
| Unicode | Glyph | Description | Replacement |
|---|
| U+00C7 | Ç | Latin capital letter C with cedilla | Ç (GSM-7 native) |
Tildes and circumflex
| Unicode | Glyph | Description | Replacement |
|---|
| U+02C6 | ˆ | Modifier letter circumflex accent | ^ |
| U+0302 | ̂ | Combining circumflex accent | ^ |
| U+FF3E | ^ | Fullwidth circumflex accent | ^ |
| U+1DCD | ᷍ | Combining double circumflex above | ^ |
| U+02DC | ˜ | Small tilde | ~ |
| U+02F7 | ˷ | Modifier letter low tilde | ~ |
| U+0303 | ̃ | Combining tilde | ~ |
| U+0330 | ̰ | Combining tilde below | ~ |
| U+0334 | ̴ | Combining tilde overlay | ~ |
| U+223C | ∼ | Tilde operator | ~ |
| U+FF5E | ~ | Fullwidth tilde | ~ |
Whitespace characters
These characters are replaced with a standard space or removed:
| Unicode | Description | Replacement |
|---|
| U+00A0 | No-break space | (space) |
| U+2000 | En quad | (space) |
| U+2001 | Em quad | (space) |
| U+2002 | En space | (space) |
| U+2003 | Em space | (space) |
| U+2004 | Three-per-em space | (space) |
| U+2005 | Four-per-em space | (space) |
| U+2006 | Six-per-em space | (space) |
| U+2007 | Figure space | (space) |
| U+2008 | Punctuation space | (space) |
| U+2009 | Thin space | (space) |
| U+200A | Hair space | (space) |
| U+200B | Zero width space | (removed) |
| U+202F | Narrow no-break space | (space) |
| U+205F | Medium mathematical space | (space) |
| U+3000 | Ideographic space | (space) |
| U+FEFF | Zero width no-break space | (removed) |
| U+2028 | Line separator | (removed) |
| U+2029 | Paragraph separator | (removed) |
| U+2060 | Word joiner | (removed) |
Control characters
These control characters are removed or transformed:
| Unicode | Description | Replacement |
|---|
| U+0009 | Tab | 7 spaces |
| U+0000 | Null | (removed) |
| U+0003 | End of text | (removed) |
| U+0004 | End of transmission | (removed) |
| U+0010 | Escape | (removed) |
| U+0011 | Device control one | (removed) |
| U+0012 | Device control two | (removed) |
| U+0013 | Device control three | (removed) |
| U+0014 | Device control four | (removed) |
| U+0017 | End of transmission block | (removed) |
| U+0019 | End of medium | (removed) |
| U+0080 | C1 control codes | (removed) |
| U+008D | Reverse line feed | (removed) |
| U+0090 | Device control string | (removed) |
| U+009B | Control sequence introducer | (removed) |
| U+009F | Application program command | (removed) |
Tab characters (U+0009) are converted to 7 spaces, which can significantly increase message length and affect segment count.
Edge cases
Smart encoding handles several edge cases:
Message length increases
Some substitutions increase message length. For example:
- Horizontal ellipsis (
…) becomes three periods (...) — adds 2 characters.
- Tab (U+0009) becomes 7 spaces — adds 6 characters.
- Vulgar fractions like
½ become 1/2 — adds 2 characters.
The segment count is calculated after these replacements, so a message near the 160-character limit may become multi-part after transformation.
Mixed replaceable and non-replaceable characters
If your message contains both replaceable Unicode characters and non-replaceable ones (like emojis), the replaceable characters are still substituted. However, the non-replaceable characters will still cause UTF-16 encoding.
Extended GSM-7 characters
The characters ~^|\{}[] are part of the GSM-7 extended set and count as 2 characters each when calculating segment length. Smart encoding accounts for this when determining final segment count.
Zero-width characters
Zero-width characters (like U+200B zero-width space) are removed entirely from the message.
If your message consists entirely of zero-width or control characters that get removed, the API will return an error. Messages cannot be empty after smart encoding transformation.
Limitations
- Smart encoding applies to SMS only. MMS and RCS use UTF-8 encoding by default.
- Not all Unicode characters have GSM-7 equivalents. Emojis and non-Latin scripts will still trigger UTF-16 encoding.
- Substitutions may slightly alter the appearance of your message. Review the character tables above to understand what changes will occur.