Skip to main content

What voice design does

Voice design generates a synthetic voice from a natural language description. You describe what you want — age, tone, accent, pacing — and the AI creates audio samples that match. This is not voice cloning. There’s no source audio. The voice is generated from scratch based on your text prompt.

The two-step flow: design → clone

The API has two separate resources:
  1. Voice Design — an intermediate artifact. Think of it as a draft. You can iterate on it (up to 50 versions per design). It is NOT usable for TTS directly.
  2. Voice Clone — a production-ready voice. Created from a design. This is what you pass to AI Assistants, Call Control, and the TTS API.
POST /v2/voice_designs → generates a sample → returns design id + version
POST /v2/voice_clones  → saves the design as a usable voice → returns voice clone id
The portal hides this two-step flow behind a single “Save This Voice” button. If you’re using the API directly, you need both steps.