xAI

Voice format: xAI.<VoiceId> xAI Grok voices are expressive, multilingual text-to-speech voices. They support inline speech tags for pauses, vocal sounds, emphasis, pitch, pace, and intensity.

xAI Grok voices are higher-latency than Telnyx Ultra. For latency-sensitive applications that need sub-100ms time to first byte, use Ultra.

Voices

Voice	Voice ID	Use for
Ara	`xAI.ara`	Warm, conversational assistant experiences
Eve	`xAI.eve`	General-purpose voice assistant experiences
Leo	`xAI.leo`	Confident, direct interactions
Rex	`xAI.rex`	Characterful or energetic interactions
Sal	`xAI.sal`	Distinctive conversational tone

Voice Samples

Voice	Sample
`xAI.ara`
`xAI.eve`
`xAI.leo`
`xAI.rex`
`xAI.sal`

WebSocket

xAI Grok voices are not available on the public TTS WebSocket API. Use the REST API for direct text-to-speech generation, or use xAI Grok voices with AI Assistants.

REST API

Fields

Field	Type	Default	Description
`language`	string	`auto`	Language code, or `auto` to detect automatically.
`output_format`	string	`mp3`	`mp3`, `wav`, `pcm`, `mulaw`, or `alaw`.
`sample_rate`	integer	`24000`	8000, 16000, 22050, 24000, 44100, or 48000.
`output_type`	string	`binary_output`	`binary_output`, `base64_output`, or `audio_id`.

{
  "text": "Let me check that for you. [pause] I found your appointment.",
  "voice": "xAI.eve",
  "voice_settings": {
    "language": "auto",
    "output_format": "mp3",
    "sample_rate": 24000
  }
}

Response

Default (binary_output): chunked audio bytes. With output_type: "base64_output": JSON with base64-encoded audio. With output_type: "audio_id": JSON with an audio_url for deferred retrieval.

Expressive speech tags

Use speech tags inline in text when you want more expressive delivery.

Tag	Use for
`[pause]`	A short natural pause
`[long-pause]`	A longer pause for topic transitions or important moments
`[laugh]`, `[chuckle]`, `[giggle]`	Natural laughter or amused reactions
`[sigh]`, `[breath]`, `[inhale]`, `[exhale]`	Breath and sigh sounds
<whisper>	Whispered delivery
<soft>	Softer delivery
<loud>	Louder delivery
<emphasis>	Emphasized delivery
<slow>, <fast>	Slower or faster pace
<higher-pitch>, <lower-pitch>	Higher or lower pitch

So I walked in and [pause] there it was. [laugh] I honestly could not believe it!

<emphasis>Your appointment is confirmed for tomorrow at 3 PM.</emphasis>

Use expressive tags sparingly. The goal is natural delivery, not tagging every sentence.

AI Assistants

For AI Assistants, choose an xAI Grok voice such as xAI.eve and enable Expressive Mode to let the assistant decide when speech tags improve the caller experience.

AI Assistants

Build voice AI assistants using xAI Grok voices with Expressive Mode.

TTS REST API

Generate speech directly with REST TTS requests.

WebSocket Streaming

REST API

Providers

Other

API Reference

For AI Agents

Voices

Voice Samples

WebSocket

REST API

Fields

Response

Expressive speech tags

AI Assistants

AI Assistants

TTS REST API

​Voices

​Voice Samples

​WebSocket

​REST API

​Fields

​Response

​Expressive speech tags

​AI Assistants

AI Assistants

TTS REST API

Voices

Voice Samples

WebSocket

REST API

Fields

Response

Expressive speech tags

AI Assistants