Skip to main content

Models

Set via model_id (body) on POST /v2/voice_clones/from_upload, or use provider (body) to select Minimax.
ModelProviderAudio lengthMax fileSync/AsyncBest for
Qwen3TTSTelnyx (default)3–15s (auto-trimmed to 10s)5 MBSync (201)Short, clean samples
UltraTelnyxUp to 10s5 MBAsync (202)Higher quality, more natural
speech-2.8-turboMinimax10s–5 min20 MBSync (201)Longer recordings, more vocal range

Audio Requirements

Body parameter audio_file (multipart) on POST /v2/voice_clones/from_upload.
Qwen3TTSUltraMinimax
Audio length3–15s (5–10s optimal)Up to 10s10s–5 min
Max file size5 MB5 MB20 MB
FormatsWAV, MP3, FLAC, OGG, M4ASameSame
  • Qwen3TTS: aim for 5–10 seconds. Longer isn’t better — auto-trims to 10s.
  • Minimax: longer is better. 1–2 minutes of varied speech gives more vocal range.

The ref_text Parameter

Body parameter on POST /v2/voice_clones/from_upload. Optional. A transcript of what’s being said in the audio. Improves clone quality by giving the model a text reference to align against.

Ultra Async Flow

When model_id is "Ultra", the API returns 202 Accepted instead of 201:
POST /v2/voice_clones/from_upload → 202 { "data": { "status": "pending" } }
Poll until ready:
GET /v2/voice_clones/{id} → 200 { "data": { "status": "active" } }
See Responses for status values and voice ID format. See Errors for Minimax error codes.