Voice Clone API parameters

Models

Set via model_id (body) on POST /v2/voice_clones/from_upload, or use provider (body) to select Minimax.

Model	Provider	Audio length	Max file	Sync/Async	Best for
Qwen3TTS	Telnyx (default)	3–15s (auto-trimmed to 10s)	5 MB	Sync (201)	Short, clean samples
Ultra	Telnyx	Up to 10s	5 MB	Async (202)	Higher quality, more natural
speech-2.8-turbo	Minimax	10s–5 min	20 MB	Sync (201)	Longer recordings, more vocal range

Audio Requirements

Body parameter audio_file (multipart) on POST /v2/voice_clones/from_upload.

	Qwen3TTS	Ultra	Minimax
Audio length	3–15s (5–10s optimal)	Up to 10s	10s–5 min
Max file size	5 MB	5 MB	20 MB
Formats	WAV, MP3, FLAC, OGG, M4A	Same	Same

Qwen3TTS: aim for 5–10 seconds. Longer isn’t better — auto-trims to 10s.
Minimax: longer is better. 1–2 minutes of varied speech gives more vocal range.

The `ref_text` Parameter

Body parameter on POST /v2/voice_clones/from_upload. Optional. A transcript of what’s being said in the audio. Improves clone quality by giving the model a text reference to align against.

Ultra Async Flow

When model_id is "Ultra", the API returns 202 Accepted instead of 201:

POST /v2/voice_clones/from_upload → 202 { "data": { "status": "pending" } }

Poll until ready:

GET /v2/voice_clones/{id} → 200 { "data": { "status": "active" } }

See Responses for status values and voice ID format. See Errors for Minimax error codes.

Design a Voice

Clone from Audio

Using Custom Voices

For AI Agents

Voice Clone API parameters

Models

Audio Requirements

The `ref_text` Parameter

Ultra Async Flow

​Models

​Audio Requirements

​The ref_text Parameter

​Ultra Async Flow

Models

Audio Requirements

The `ref_text` Parameter

Ultra Async Flow