Models
Set viamodel_id (body) on POST /v2/voice_clones/from_upload, or use provider (body) to select Minimax.
| Model | Provider | Audio length | Max file | Sync/Async | Best for |
|---|---|---|---|---|---|
| Qwen3TTS | Telnyx (default) | 3–15s (auto-trimmed to 10s) | 5 MB | Sync (201) | Short, clean samples |
| Ultra | Telnyx | Up to 10s | 5 MB | Async (202) | Higher quality, more natural |
| speech-2.8-turbo | Minimax | 10s–5 min | 20 MB | Sync (201) | Longer recordings, more vocal range |
Audio Requirements
Body parameteraudio_file (multipart) on POST /v2/voice_clones/from_upload.
| Qwen3TTS | Ultra | Minimax | |
|---|---|---|---|
| Audio length | 3–15s (5–10s optimal) | Up to 10s | 10s–5 min |
| Max file size | 5 MB | 5 MB | 20 MB |
| Formats | WAV, MP3, FLAC, OGG, M4A | Same | Same |
- Qwen3TTS: aim for 5–10 seconds. Longer isn’t better — auto-trims to 10s.
- Minimax: longer is better. 1–2 minutes of varied speech gives more vocal range.
The ref_text Parameter
Body parameter on POST /v2/voice_clones/from_upload. Optional.
A transcript of what’s being said in the audio. Improves clone quality by giving the model a text reference to align against.
Ultra Async Flow
Whenmodel_id is "Ultra", the API returns 202 Accepted instead of 201: