Smart encoding applies to SMS only. MMS and RCS messages use UTF-8 encoding by default and are not affected.
Why use smart encoding
SMS messages using GSM-7 encoding fit 160 characters per segment. When a message contains even one Unicode character outside GSM-7, the entire message switches to UTF-16 encoding, which only fits 70 characters per segment. A single smart quote (") or em dash (—) can more than double your messaging costs.
Example:
| Message | Encoding | Segments | Cost impact |
|---|---|---|---|
Hello, how are you? (150 chars) | GSM-7 | 1 | Base cost |
Hello, how are you? (150 chars with "smart quotes") | UTF-16 | 3 | 3× cost |
Hello, how are you? (same, smart encoding ON) | GSM-7 | 1 | Base cost |
How it works
When smart encoding is enabled:- Your message text is scanned for Unicode characters that have GSM-7 equivalents.
- Matching characters are automatically replaced (e.g.,
"→",—→-,…→...). - The final encoding (GSM-7 or UTF-16) is determined after all substitutions.
- The segment count is recalculated based on the transformed text.
- The API response includes metadata about the transformation.
Webhooks return the original text. The
text field in delivery webhooks contains your original message, not the smart-encoded version. This ensures your application’s message tracking stays consistent.Enable smart encoding
You can enable smart encoding at two levels: on a messaging profile (applies to all messages) or on a per-request basis.On a messaging profile
Enable smart encoding as a default for all messages sent through a profile.- API
- Portal
Per-request control
Override the profile setting on individual messages using theencoding parameter:
| Value | Behavior |
|---|---|
auto | Follow the profile’s smart_encoding setting (default). |
gsm7 | Force GSM-7 encoding. Smart encoding is applied. Returns 400 if the message contains characters that cannot be converted to GSM-7 (e.g., emoji). |
ucs2 | Force UCS-2 encoding. Skips smart encoding entirely. |
The request-level
encoding parameter takes precedence over the messaging profile’s smart_encoding setting.Response metadata
When smart encoding is applied, the API response includes detailed metadata:| Field | Description |
|---|---|
smart_encoding_applied | Whether any characters were replaced. |
final_encoding | The encoding used after transformation (gsm7 or ucs2). |
segment_count | Number of segments after smart encoding. |
character_count | Message length after transformation. |
replaced_character_count | Number of unique characters that were substituted. |
length_change | Difference in length (positive means message grew, e.g., … → ...). |
The
parts field in the top-level response reflects the segment count after smart encoding, so you always see the actual billing impact.Checking the response
Precedence rules
Smart encoding behavior is determined by a combination of your messaging profile setting and the per-requestencoding parameter:
Profile smart_encoding | Request encoding | Behavior |
|---|---|---|
true | (not set) | Smart encoding applied |
false | (not set) | Smart encoding not applied |
true | auto | Smart encoding applied |
false | auto | Smart encoding not applied |
true or false | gsm7 | Smart encoding applied, must result in GSM-7 or returns 400 |
true or false | ucs2 | Smart encoding skipped, forced UCS-2 |
The request-level
encoding parameter always takes precedence over the messaging profile setting.Character substitutions
Smart encoding replaces 200+ Unicode characters with GSM-7 equivalents. The tables below show all supported substitutions grouped by category.Quotation marks
Quotation marks
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+00AB | « | Left-pointing double angle quotation mark | ” |
| U+00BB | » | Right-pointing double angle quotation mark | ” |
| U+201C | ” | Left double quotation mark | ” |
| U+201D | ” | Right double quotation mark | ” |
| U+02BA | ʺ | Modifier letter double prime | ” |
| U+02EE | ˮ | Modifier letter double apostrophe | ” |
| U+201F | ‟ | Double high-reversed-9 quotation mark | ” |
| U+275D | ❝ | Heavy double turned comma quotation mark ornament | ” |
| U+275E | ❞ | Heavy double comma quotation mark ornament | ” |
| U+301D | 〝 | Reversed double prime quotation mark | ” |
| U+301E | 〞 | Double prime quotation mark | ” |
| U+FF02 | " | Fullwidth quotation mark | ” |
| U+201E | „ | Double low quotation mark | ” |
Apostrophes and single quotes
Apostrophes and single quotes
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+2018 | ’ | Left single quotation mark | ’ |
| U+2019 | ’ | Right single quotation mark | ’ |
| U+02BB | ʻ | Modifier letter turned comma | ’ |
| U+02C8 | ˈ | Modifier letter vertical line | ’ |
| U+02BC | ʼ | Modifier letter apostrophe | ’ |
| U+02BD | ʽ | Modifier letter reversed comma | ’ |
| U+02B9 | ʹ | Modifier letter prime | ’ |
| U+201B | ‛ | Single high-reversed-9 quotation mark | ’ |
| U+FF07 | ' | Fullwidth apostrophe | ’ |
| U+00B4 | ´ | Acute accent | ’ |
| U+02CA | ˊ | Modifier letter acute accent | ’ |
| U+0060 | ` | Grave accent | ’ |
| U+02CB | ˋ | Modifier letter grave accent | ’ |
| U+275B | ❛ | Heavy single turned comma quotation mark ornament | ’ |
| U+275C | ❜ | Heavy single comma quotation mark ornament | ’ |
| U+0313 | ̓ | Combining comma above | ’ |
| U+0314 | ̔ | Combining reversed comma above | ’ |
| U+FE10 | ︐ | Presentation form for vertical comma | ’ |
| U+FE11 | ︑ | Presentation form for vertical ideographic comma | ’ |
Dashes and hyphens
Dashes and hyphens
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+2014 | — | Em dash | - |
| U+2013 | – | En dash | - |
| U+23BC | ⎼ | Horizontal scan line-7 | - |
| U+23BD | ⎽ | Horizontal scan line-9 | - |
| U+2015 | ― | Horizontal bar | - |
| U+FE63 | ﹣ | Small hyphen-minus | - |
| U+FF0D | - | Fullwidth hyphen-minus | - |
| U+2010 | ‐ | Hyphen | - |
| U+2022 | • | Bullet | - |
| U+2043 | ⁃ | Hyphen bullet | - |
Slashes and division
Slashes and division
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+00F7 | ÷ | Division sign | / |
| U+00BC | ¼ | Vulgar fraction one quarter | 1/4 |
| U+00BD | ½ | Vulgar fraction one half | 1/2 |
| U+00BE | ¾ | Vulgar fraction three quarters | 3/4 |
| U+29F8 | ⧸ | Big solidus | / |
| U+0337 | ̷ | Combining short solidus overlay | / |
| U+0338 | ̸ | Combining long solidus overlay | / |
| U+2044 | ⁄ | Fraction slash | / |
| U+2215 | ∕ | Division slash | / |
| U+FF0F | / | Fullwidth solidus | / |
Backslashes
Backslashes
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+29F9 | ⧹ | Big reverse solidus | \ |
| U+29F5 | ⧵ | Reverse solidus operator | \ |
| U+20E5 | Combining reverse solidus overlay | \ | |
| U+FE68 | ﹨ | Small reverse solidus | \ |
| U+FF3C | \ | Fullwidth reverse solidus | \ |
Underscores and vertical lines
Underscores and vertical lines
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+0332 | ̲ | Combining low line | _ |
| U+FF3F | _ | Fullwidth low line | _ |
| U+2017 | ‗ | Double low line | _ |
| U+20D2 | ⃒ | Combining long vertical line overlay | | |
| U+20D3 | ⃓ | Combining short vertical line overlay | | |
| U+2223 | ∣ | Divides | | |
| U+FF5C | | | Fullwidth vertical line | | |
| U+23B8 | ⎸ | Left vertical box line | | |
| U+23B9 | ⎹ | Right vertical box line | | |
| U+23D0 | ⏐ | Vertical line extension | | |
| U+239C | ⎜ | Left parenthesis extension | | |
| U+239F | ⎟ | Right parenthesis extension | | |
Symbols and punctuation
Symbols and punctuation
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+FE6B | ﹫ | Small commercial at sign | @ |
| U+FF20 | @ | Fullwidth commercial at sign | @ |
| U+FE69 | ﹩ | Small dollar sign | $ |
| U+FF04 | $ | Fullwidth dollar sign | $ |
| U+01C3 | ǃ | Latin letter retroflex click | ! |
| U+FE15 | ︕ | Presentation form for vertical exclamation mark | ! |
| U+FE57 | ﹗ | Small exclamation mark | ! |
| U+FF01 | ! | Fullwidth exclamation mark | ! |
| U+203C | ‼ | Double exclamation mark | !! |
| U+FE5F | ﹟ | Small number sign | # |
| U+FF03 | # | Fullwidth number sign | # |
| U+FE6A | ﹪ | Small percent sign | % |
| U+FF05 | % | Fullwidth percent sign | % |
| U+FE60 | ﹠ | Small ampersand | & |
| U+FF06 | & | Fullwidth ampersand | & |
| U+2026 | … | Horizontal ellipsis | … |
Commas
Commas
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+201A | ‚ | Single low-9 quotation mark | , |
| U+0326 | ̦ | Combining comma below | , |
| U+FE50 | ﹐ | Small comma | , |
| U+3001 | 、 | Ideographic comma | , |
| U+FE51 | ﹑ | Small ideographic comma | , |
| U+FF0C | , | Fullwidth comma | , |
| U+FF64 | 、 | Halfwidth ideographic comma | , |
Parentheses and brackets
Parentheses and brackets
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+2768 | ❨ | Medium left parenthesis ornament | ( |
| U+276A | ❪ | Medium flattened left parenthesis ornament | ( |
| U+FE59 | ﹙ | Small left parenthesis | ( |
| U+FF08 | ( | Fullwidth left parenthesis | ( |
| U+27EE | ⟮ | Mathematical left flattened parenthesis | ( |
| U+2985 | ⦅ | Left white parenthesis | ( |
| U+2769 | ❩ | Medium right parenthesis ornament | ) |
| U+276B | ❫ | Medium flattened right parenthesis ornament | ) |
| U+FE5A | ﹚ | Small right parenthesis | ) |
| U+FF09 | ) | Fullwidth right parenthesis | ) |
| U+27EF | ⟯ | Mathematical right flattened parenthesis | ) |
| U+2986 | ⦆ | Right white parenthesis | ) |
| U+2774 | ❴ | Medium left curly bracket ornament | { |
| U+FE5B | ﹛ | Small left curly bracket | { |
| U+FF5B | { | Fullwidth left curly bracket | { |
| U+2775 | ❵ | Medium right curly bracket ornament | } |
| U+FE5C | ﹜ | Small right curly bracket | } |
| U+FF5D | } | Fullwidth right curly bracket | } |
| U+FF3B | [ | Fullwidth left square bracket | [ |
| U+FF3D | ] | Fullwidth right square bracket | ] |
Asterisks
Asterisks
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+204E | ⁎ | Low asterisk | * |
| U+2217 | ∗ | Asterisk operator | * |
| U+229B | ⊛ | Circled asterisk operator | * |
| U+2722 | ✢ | Four teardrop-spoked asterisk | * |
| U+2723 | ✣ | Four balloon-spoked asterisk | * |
| U+2724 | ✤ | Heavy four balloon-spoked asterisk | * |
| U+2725 | ✥ | Four club-spoked asterisk | * |
| U+2731 | ✱ | Heavy asterisk | * |
| U+2732 | ✲ | Open center asterisk | * |
| U+2733 | ✳ | Eight spoked asterisk | * |
| U+273A | ✺ | Sixteen pointed asterisk | * |
| U+273B | ✻ | Teardrop-spoked asterisk | * |
| U+273C | ✼ | Open center teardrop-spoked asterisk | * |
| U+273D | ✽ | Heavy teardrop-spoked asterisk | * |
| U+2743 | ❃ | Heavy teardrop-spoked pinwheel asterisk | * |
| U+2749 | ❉ | Balloon-spoked asterisk | * |
| U+274A | ❊ | Eight teardrop-spoked propeller asterisk | * |
| U+274B | ❋ | Heavy eight teardrop-spoked propeller asterisk | * |
| U+29C6 | ⧆ | Squared asterisk | * |
| U+FE61 | ﹡ | Small asterisk | * |
| U+FF0A | * | Fullwidth asterisk | * |
Math, comparison, periods, and colons
Math, comparison, periods, and colons
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+02D6 | ˖ | Modifier letter plus sign | + |
| U+FE62 | ﹢ | Small plus sign | + |
| U+FF0B | + | Fullwidth plus sign | + |
| U+FE64 | ﹤ | Small less-than sign | < |
| U+FF1C | < | Fullwidth less-than sign | < |
| U+0347 | ͇ | Combining equals sign below | = |
| U+A78A | ꞊ | Modifier letter short equals sign | = |
| U+FE66 | ﹦ | Small equals sign | = |
| U+FF1D | = | Fullwidth equals sign | = |
| U+FE65 | ﹥ | Small greater-than sign | > |
| U+FF1E | > | Fullwidth greater-than sign | > |
| U+2039 | ‹ | Single left-pointing angle quotation mark | < |
| U+203A | › | Single right-pointing angle quotation mark | > |
| U+3002 | 。 | Ideographic full stop | . |
| U+FE52 | ﹒ | Small full stop | . |
| U+FF0E | . | Fullwidth full stop | . |
| U+FF61 | 。 | Halfwidth ideographic full stop | . |
| U+02D0 | ː | Modifier letter triangular colon | : |
| U+02F8 | ˸ | Modifier letter raised colon | : |
| U+2982 | ⦂ | Z notation type colon | : |
| U+A789 | ꞉ | Modifier letter colon | : |
| U+FE13 | ︓ | Presentation form for vertical colon | : |
| U+FF1A | : | Fullwidth colon | : |
| U+204F | ⁏ | Reversed semicolon | ; |
| U+FE14 | ︔ | Presentation form for vertical semicolon | ; |
| U+FE54 | ﹔ | Small semicolon | ; |
| U+FF1B | ; | Fullwidth semicolon | ; |
| U+FE16 | ︖ | Presentation form for vertical question mark | ? |
| U+FE56 | ﹖ | Small question mark | ? |
| U+FF1F | ? | Fullwidth question mark | ? |
Fullwidth digits
Fullwidth digits
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+FF10 | 0 | Fullwidth digit zero | 0 |
| U+FF11 | 1 | Fullwidth digit one | 1 |
| U+FF12 | 2 | Fullwidth digit two | 2 |
| U+FF13 | 3 | Fullwidth digit three | 3 |
| U+FF14 | 4 | Fullwidth digit four | 4 |
| U+FF15 | 5 | Fullwidth digit five | 5 |
| U+FF16 | 6 | Fullwidth digit six | 6 |
| U+FF17 | 7 | Fullwidth digit seven | 7 |
| U+FF18 | 8 | Fullwidth digit eight | 8 |
| U+FF19 | 9 | Fullwidth digit nine | 9 |
Fullwidth and small capital letters
Fullwidth and small capital letters
Fullwidth uppercase (U+FF21–U+FF3A) → A–ZFullwidth lowercase (U+FF41–U+FF5A) → a–zSmall capital letters:
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+1D00 | ᴀ | Latin letter small capital A | A |
| U+0299 | ʙ | Latin letter small capital B | B |
| U+1D04 | ᴄ | Latin letter small capital C | C |
| U+1D05 | ᴅ | Latin letter small capital D | D |
| U+1D07 | ᴇ | Latin letter small capital E | E |
| U+A730 | ꜰ | Latin letter small capital F | F |
| U+0262 | ɢ | Latin letter small capital G | G |
| U+029C | ʜ | Latin letter small capital H | H |
| U+026A | ɪ | Latin letter small capital I | I |
| U+1D0A | ᴊ | Latin letter small capital J | J |
| U+1D0B | ᴋ | Latin letter small capital K | K |
| U+029F | ʟ | Latin letter small capital L | L |
| U+1D0D | ᴍ | Latin letter small capital M | M |
| U+0274 | ɴ | Latin letter small capital N | N |
| U+1D0F | ᴏ | Latin letter small capital O | O |
| U+1D18 | ᴘ | Latin letter small capital P | P |
| U+0280 | ʀ | Latin letter small capital R | R |
| U+A731 | ꜱ | Latin letter small capital S | S |
| U+1D1B | ᴛ | Latin letter small capital T | T |
| U+1D1C | ᴜ | Latin letter small capital U | U |
| U+1D20 | ᴠ | Latin letter small capital V | V |
| U+1D21 | ᴡ | Latin letter small capital W | W |
| U+028F | ʏ | Latin letter small capital Y | Y |
| U+1D22 | ᴢ | Latin letter small capital Z | Z |
Greek letters
Greek letters
Greek capital letters that visually resemble Latin letters are substituted:
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+0391 | Α | Greek capital letter Alpha | A |
| U+0392 | Β | Greek capital letter Beta | B |
| U+0395 | Ε | Greek capital letter Epsilon | E |
| U+0397 | Η | Greek capital letter Eta | H |
| U+0399 | Ι | Greek capital letter Iota | I |
| U+039A | Κ | Greek capital letter Kappa | K |
| U+039C | Μ | Greek capital letter Mu | M |
| U+039D | Ν | Greek capital letter Nu | N |
| U+039F | Ο | Greek capital letter Omicron | O |
| U+03A1 | Ρ | Greek capital letter Rho | P |
| U+03A4 | Τ | Greek capital letter Tau | T |
| U+03A7 | Χ | Greek capital letter Chi | X |
| U+03A5 | Υ | Greek capital letter Upsilon | Y |
| U+0396 | Ζ | Greek capital letter Zeta | Z |
Tildes and circumflex
Tildes and circumflex
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+02C6 | ˆ | Modifier letter circumflex accent | ^ |
| U+0302 | ̂ | Combining circumflex accent | ^ |
| U+FF3E | ^ | Fullwidth circumflex accent | ^ |
| U+1DCD | ᷍ | Combining double circumflex above | ^ |
| U+02DC | ˜ | Small tilde | ~ |
| U+02F7 | ˷ | Modifier letter low tilde | ~ |
| U+0303 | ̃ | Combining tilde | ~ |
| U+0330 | ̰ | Combining tilde below | ~ |
| U+0334 | ̴ | Combining tilde overlay | ~ |
| U+223C | ∼ | Tilde operator | ~ |
| U+FF5E | ~ | Fullwidth tilde | ~ |
Whitespace characters
Whitespace characters
These characters are replaced with a standard space or removed:
| Unicode | Description | Replacement |
|---|---|---|
| U+00A0 | No-break space | (space) |
| U+2000 | En quad | (space) |
| U+2001 | Em quad | (space) |
| U+2002 | En space | (space) |
| U+2003 | Em space | (space) |
| U+2004 | Three-per-em space | (space) |
| U+2005 | Four-per-em space | (space) |
| U+2006 | Six-per-em space | (space) |
| U+2007 | Figure space | (space) |
| U+2008 | Punctuation space | (space) |
| U+2009 | Thin space | (space) |
| U+200A | Hair space | (space) |
| U+200B | Zero width space | (removed) |
| U+202F | Narrow no-break space | (space) |
| U+205F | Medium mathematical space | (space) |
| U+3000 | Ideographic space | (space) |
| U+FEFF | Zero width no-break space | (removed) |
| U+2028 | Line separator | (removed) |
| U+2029 | Paragraph separator | (removed) |
| U+2060 | Word joiner | (removed) |
Control characters
Control characters
These control characters are removed or transformed:
| Unicode | Description | Replacement |
|---|---|---|
| U+0009 | Tab | 7 spaces |
| U+0000 | Null | (removed) |
| U+0003 | End of text | (removed) |
| U+0004 | End of transmission | (removed) |
| U+0010 | Escape | (removed) |
| U+0011 | Device control one | (removed) |
| U+0012 | Device control two | (removed) |
| U+0013 | Device control three | (removed) |
| U+0014 | Device control four | (removed) |
| U+0017 | End of transmission block | (removed) |
| U+0019 | End of medium | (removed) |
| U+0080 | C1 control codes | (removed) |
| U+008D | Reverse line feed | (removed) |
| U+0090 | Device control string | (removed) |
| U+009B | Control sequence introducer | (removed) |
| U+009F | Application program command | (removed) |
Edge cases
Message length increases
Message length increases
Some substitutions increase message length. For example:
- Horizontal ellipsis (
…) becomes three periods (...) — adds 2 characters - Tab (U+0009) becomes 7 spaces — adds 6 characters
- Vulgar fractions like
½become1/2— adds 2 characters
Mixed replaceable and non-replaceable characters
Mixed replaceable and non-replaceable characters
If your message contains both replaceable Unicode characters and non-replaceable ones (like emojis), smart encoding still applies all possible substitutions. However, the non-replaceable characters will keep the message in UTF-16 encoding.This is still beneficial — fewer Unicode characters means a shorter UTF-16 message and potentially fewer segments.
Extended GSM-7 characters
Extended GSM-7 characters
The characters
~, ^, |, \, {, }, [, ] are part of the GSM-7 extended set and count as 2 characters each when calculating segment length. Smart encoding accounts for this when determining the final segment count.Zero-width characters and empty messages
Zero-width characters and empty messages
Zero-width characters (like U+200B zero-width space) are removed entirely. If your message consists entirely of zero-width or control characters that all get removed, the API returns a
400 error — messages cannot be empty after transformation.encoding=gsm7 with non-convertible characters
encoding=gsm7 with non-convertible characters
If you set
encoding=gsm7 on a request but the message contains characters that cannot be represented in GSM-7 (e.g., emoji), the API returns a 400 error rather than silently dropping characters.Limitations
- SMS only — MMS and RCS use UTF-8 encoding by default and are not affected by smart encoding.
- Not all characters convert — Emojis and non-Latin scripts (e.g., Chinese, Arabic, Cyrillic) have no GSM-7 equivalents and will still trigger UTF-16 encoding.
- Visual differences — Substitutions may slightly alter the appearance of your message. Review the character tables above to understand what changes will occur.
- Length may increase — Some substitutions produce longer output (e.g.,
…→...). Always check the response metadata for the actual segment count.