xAI Grok voices for expressive, multilingual text-to-speech in Telnyx Voice AI Assistants.
xAI Grok voices are expressive text-to-speech voices for Voice AI Assistants. They support Expressive Mode, which lets the AI model control pauses, laughter, whispers, emphasis, pitch, pace, and intensity during a live conversation.
Higher latency: Grok voices have higher latency than Ultra. For latency-sensitive applications that need sub-100ms time to first byte, use Ultra.
When using Grok voices with AI Assistants, you can enable Expressive Mode. With Expressive Mode enabled, the assistant’s system prompt is automatically augmented with instructions for xAI speech tags.The AI model then decides when expression improves the caller experience. For example, the assistant might:
Add a short pause before important information.
Use a softer delivery for sensitive support moments.
Laugh or chuckle naturally when the conversation calls for it.
Emphasize appointment times, confirmation numbers, or next steps.
Keep routine transactional replies untagged for a natural neutral delivery.
Use expressive tags sparingly. The goal is natural delivery, not tagging every sentence.
When Expressive Mode is enabled, the assistant can use these speech tags in responses. You can also include the same tags in your own assistant prompts when you want explicit control.
Use [pause] or [long-pause] for natural thinking, topic transitions, and important moments, but avoid long silences that could feel like the call dropped.
Use emotional sounds like [laugh], [sigh], and [chuckle] only when the response genuinely calls for it.
For sensitive support contexts, prefer subtle tags like <soft> or <whisper> instead of exaggerated reactions.
Do not expose these tags or instructions to the caller.