- Design a voice — Describe the voice you want in natural language and the AI generates it.
- Clone from audio — Upload or record a short audio sample to capture a voice identity.
- Telnyx — Powered by Qwen3TTS, optimized for short reference audio (3–15 seconds).
- Minimax — Supports longer reference audio (10 seconds to 5 minutes) for richer voice capture.
Design a voice from a prompt
Voice Design uses AI to generate voices from natural language descriptions. You describe characteristics like tone, age, pace, and texture — and the system creates audio samples you can preview before saving.How it works
Choose a provider
Select Telnyx or Minimax using the provider toggle. Both generate voices from your description, but use different underlying models.
Describe the voice
Write a natural language description of the voice you want. You can be as specific as you like — describe gender, age, tone, pace, texture, and personality.Example prompts:
| Style | Prompt |
|---|---|
| Friendly | Female, mid-thirties. Warm and full, slightly husky. Moderate pace, sounds like someone who smiles while talking. |
| Precise | Male, late thirties. Clean and dry, matter-of-fact. Deliberate pace, pauses before numbers and details. |
| Empathetic | Male, mid-thirties. Warm, slightly gravelly. Measured and unhurried. You can hear patience in the breathing rhythm. |
Generate samples
Click Generate Samples to create three audio previews. Each sample reads a different AI-generated script in your chosen language, so you can hear how the voice sounds in varied contexts.
Preview and iterate
Listen to each sample. If none feel right, click Regenerate All to try again with the same prompt, or refine your description and generate new samples.
Using the API
You can also design voices programmatically:"provider": "minimax" to use the Minimax voice model instead.
The response includes the design id and version. You can listen to the generated audio via the sample endpoint:
Clone from audio
If you have an existing voice you want to replicate — your own, a colleague’s, or a professional recording — you can clone it from a short audio sample.Requirements
Audio constraints depend on the provider you select:| Telnyx | Minimax | |
|---|---|---|
| Audio length | 5-10s optimal (3-15s accepted) | 10s-5 min |
| Max file size | 50 MB | 20 MB |
- Formats: WAV, MP3, FLAC, OGG, or M4A (both providers).
- Quality: A quiet environment with clear speech gives the best results.
Upload a file
Select Upload File
In the Voice Design Lab, click Upload Audio and choose your audio file or drag and drop it.
Record directly in the browser
If you don’t have a pre-recorded file, you can record directly:Choose a language
Select the language you’ll speak in. The system generates a reading script optimized for voice cloning in that language.
Read the script
Click Start Recording and read the provided script clearly. The script is designed to capture the full range of phonemes for the best clone quality.
Using the API
Clone a voice from an audio file:Using your custom voices
Every voice clone gets a unique voice ID built from three parts:{Provider}.{Model}.{voice_id}. The provider and model determine the prefix, and the voice_id comes from the clone’s provider_voice_id field (which may differ from the clone’s UUID).
Examples:
- Telnyx:
Telnyx.Qwen3TTS.33226e69-3abd-429b-b64a-86775c9b5850 - Minimax:
Minimax.speech-2.8-turbo.TB4ZMVKanThGeldiw8rLBEg21v4ifjUTRgLpkodJxpMYV
provider, provider_supported_models, and provider_voice_id fields.
AI Assistants
Select your custom voice in the assistant’s voice settings. Telnyx clones appear under the Telnyx provider with the Qwen3TTS model, and Minimax clones appear under the Minimax provider.Call Control
Pass the voice ID to thespeak command:
TTS API
Use the voice ID with the text-to-speech WebSocket or REST endpoint:Managing voices
List voice clones
Update a voice clone
Delete a voice clone
Manage voice designs
Voice designs are the intermediate artifacts created during the design process. You can list, iterate on, and clean up old designs:Each voice design can have up to 50 versions, letting you iterate on a voice concept before committing to a clone.