What voice design does
Voice design generates a synthetic voice from a natural language description. You describe what you want — age, tone, accent, pacing — and the AI creates audio samples that match. This is not voice cloning. There’s no source audio. The voice is generated from scratch based on your text prompt.The two-step flow: design → clone
The API has two separate resources:- Voice Design — an intermediate artifact. Think of it as a draft. You can iterate on it (up to 50 versions per design). It is NOT usable for TTS directly.
- Voice Clone — a production-ready voice. Created from a design. This is what you pass to AI Assistants, Call Control, and the TTS API.