xAI Grok TTS provider — expressive multilingual voices with speech tags and auto language detection.
Voice format:xAI.<VoiceId>xAI Grok voices are expressive, multilingual text-to-speech voices. They support inline speech tags for pauses, vocal sounds, emphasis, pitch, pace, and intensity.
xAI Grok voices are higher-latency than Telnyx Ultra. For latency-sensitive applications that need sub-100ms time to first byte, use Ultra.
xAI Grok voices are not available on the public TTS WebSocket API. Use the REST API for direct text-to-speech generation, or use xAI Grok voices with AI Assistants.
{ "text": "Let me check that for you. [pause] I found your appointment.", "voice": "xAI.eve", "voice_settings": { "language": "auto", "output_format": "mp3", "sample_rate": 24000 }}
Default (binary_output): chunked audio bytes.With output_type: "base64_output": JSON with base64-encoded audio.With output_type: "audio_id": JSON with an audio_url for deferred retrieval.
For AI Assistants, choose an xAI Grok voice such as xAI.eve and enable Expressive Mode to let the assistant decide when speech tags improve the caller experience.
AI Assistants
Build voice AI assistants using xAI Grok voices with Expressive Mode.