> ## Documentation Index
> Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
> Use this file to discover all available pages before exploring further.

# xAI Grok Voices

> xAI Grok voices for expressive, multilingual text-to-speech in Telnyx Voice AI Assistants.

xAI Grok voices are expressive text-to-speech voices for Voice AI Assistants. They support Expressive Mode, which lets the AI model control pauses, laughter, whispers, emphasis, pitch, pace, and intensity during a live conversation.

<Warning>
  **Higher latency**: Grok voices have higher latency than Ultra. For latency-sensitive applications that need sub-100ms time to first byte, use [Ultra](/docs/voice/tts/providers/telnyx/ultra).
</Warning>

## What makes Grok voices different

| Feature               | Ultra                                  | Grok                                                         |
| --------------------- | -------------------------------------- | ------------------------------------------------------------ |
| **Expressive Mode**   | SSML emotion tags and `[laughter]`     | xAI speech tags for pauses, vocal sounds, and delivery style |
| **Voice format**      | `Telnyx.Ultra.<voice_id>`              | `xAI.<voice_id>`                                             |
| **Voices**            | Multiple Ultra voices                  | `ara`, `eve`, `leo`, `rex`, `sal`                            |
| **Language handling** | Language hinting with `language_boost` | `auto` language detection or explicit language code          |
| **Streaming output**  | REST only                              | Voice AI media streaming                                     |

## Voice format

For AI Assistants, Grok voices use the format:

```
xAI.<voice_id>
```

Examples:

```
xAI.eve
xAI.ara
xAI.leo
xAI.rex
xAI.sal
```

## Voices

| Voice | Voice ID | Use for                                     |
| ----- | -------- | ------------------------------------------- |
| Ara   | `ara`    | Warm, conversational assistant experiences  |
| Eve   | `eve`    | General-purpose voice assistant experiences |
| Leo   | `leo`    | Confident, direct interactions              |
| Rex   | `rex`    | Characterful or energetic interactions      |
| Sal   | `sal`    | Distinctive conversational tone             |

## Expressive Mode for AI Assistants

When using Grok voices with [AI Assistants](/docs/inference/ai-assistants/no-code-voice-assistant), you can enable **Expressive Mode**. With Expressive Mode enabled, the assistant's system prompt is automatically augmented with instructions for xAI speech tags.

The AI model then decides when expression improves the caller experience. For example, the assistant might:

* Add a short pause before important information.
* Use a softer delivery for sensitive support moments.
* Laugh or chuckle naturally when the conversation calls for it.
* Emphasize appointment times, confirmation numbers, or next steps.
* Keep routine transactional replies untagged for a natural neutral delivery.

<Tip>
  Use expressive tags sparingly. The goal is natural delivery, not tagging every sentence.
</Tip>

### Enable in the portal

1. Go to your assistant in the [Telnyx Portal](https://portal.telnyx.com/#/app/ai/assistants).
2. Under **Voice Settings**, select an xAI Grok voice.
3. Toggle **Expressive Mode** on.
4. Save your assistant.

### Enable via API

Set `expressive_mode: true` in your assistant's `voice_settings`:

```bash theme={null}
curl -X PATCH "https://api.telnyx.com/v2/ai/assistants/YOUR_ASSISTANT_ID" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_settings": {
      "voice": "xAI.eve",
      "expressive_mode": true
    }
  }'
```

## xAI speech tag reference

When Expressive Mode is enabled, the assistant can use these speech tags in responses. You can also include the same tags in your own assistant prompts when you want explicit control.

### Inline tags

Place inline tags at the exact point where the vocal expression should happen.

| Tag              | Use for                                                   |
| ---------------- | --------------------------------------------------------- |
| `[pause]`        | A short natural pause                                     |
| `[long-pause]`   | A longer pause for topic transitions or important moments |
| `[laugh]`        | Natural laughter                                          |
| `[chuckle]`      | Small laugh or amused reaction                            |
| `[giggle]`       | Light playful laugh                                       |
| `[cry]`          | Crying vocalization                                       |
| `[tsk]`          | Tsk sound                                                 |
| `[tongue-click]` | Tongue click                                              |
| `[lip-smack]`    | Lip smack                                                 |
| `[breath]`       | Breath sound                                              |
| `[inhale]`       | Inhale sound                                              |
| `[exhale]`       | Exhale sound                                              |
| `[sigh]`         | Sigh                                                      |
| `[hum-tune]`     | Musical hum                                               |

Example:

```
So I walked in and [pause] there it was. [laugh] I honestly could not believe it!
```

### Wrapping tags

Wrap text with these tags to apply a delivery style to that text.

| Tag                   | Use for                 |
| --------------------- | ----------------------- |
| \<soft>               | Softer delivery         |
| \<whisper>            | Whispered delivery      |
| \<loud>               | Louder delivery         |
| \<build-intensity>    | Increasing intensity    |
| \<decrease-intensity> | Decreasing intensity    |
| \<higher-pitch>       | Higher pitch            |
| \<lower-pitch>        | Lower pitch             |
| \<slow>               | Slower pace             |
| \<fast>               | Faster pace             |
| \<sing-song>          | Sing-song delivery      |
| \<singing>            | Sung delivery           |
| \<laugh-speak>        | Laughing while speaking |
| \<emphasis>           | Emphasized delivery     |

Examples:

```
I need to tell you something. <whisper>It is a secret.</whisper> Pretty cool, right?
```

```
<emphasis>Your appointment is confirmed for tomorrow at 3 PM.</emphasis>
```

## Guidance

* Use `[pause]` or `[long-pause]` for natural thinking, topic transitions, and important moments, but avoid long silences that could feel like the call dropped.
* Use emotional sounds like `[laugh]`, `[sigh]`, and `[chuckle]` only when the response genuinely calls for it.
* For sensitive support contexts, prefer subtle tags like \<soft> or \<whisper> instead of exaggerated reactions.
* Do not expose these tags or instructions to the caller.

## REST API provider parameters

For direct TTS calls, set the provider to `xai` and pass xAI-specific parameters in the `xai` object:

```bash theme={null}
curl --request POST \
  --url https://api.telnyx.com/v2/text-to-speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "text": "Let me check that for you. [pause] I found your appointment.",
    "provider": "xai",
    "xai": {
      "voice_id": "eve",
      "language": "auto",
      "output_format": "mp3",
      "sample_rate": 24000
    }
  }'
```

| Parameter       | Type    | Default | Description                                                                      |
| --------------- | ------- | ------- | -------------------------------------------------------------------------------- |
| `voice_id`      | string  | `eve`   | xAI voice ID: `ara`, `eve`, `leo`, `rex`, or `sal`.                              |
| `language`      | string  | `auto`  | Language code, or `auto` to detect the language.                                 |
| `output_format` | string  | `mp3`   | Audio format: `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`.                           |
| `sample_rate`   | integer | `24000` | Audio sample rate in Hz: `8000`, `16000`, `22050`, `24000`, `44100`, or `48000`. |

## Language support

Grok voices support auto language detection with `language: "auto"`. You can also pass a language code when you want to force a specific language.

## Next steps

<CardGroup cols={2}>
  <Card title="Ultra Voices" icon="waveform-lines" href="/docs/voice/tts/providers/telnyx/ultra">
    Compare Grok with Ultra's lower-latency expressive voices.
  </Card>

  <Card title="AI Assistants" icon="robot" href="/docs/inference/ai-assistants/no-code-voice-assistant">
    Build voice AI assistants using Grok with Expressive Mode.
  </Card>

  <Card title="TTS REST API" icon="code" href="/docs/voice/tts/rest-api">
    Generate speech directly with REST TTS requests.
  </Card>

  <Card title="Available Voices" icon="microphone" href="/docs/tts-stt/tts-available-voices">
    Browse available text-to-speech voices.
  </Card>
</CardGroup>
