Skip to main content
Voice format: minimax.<Model>.<VoiceId> voice_id can be a system voice (pre-built) or a cloned voice from the Voice Design Lab (organization-scoped).

Voice Samples

VoiceGenderSample
Minimax.speech-2.8-turbo.English_expressive_narratorMale
Minimax.speech-2.8-turbo.English_radiant_girlFemale

WebSocket

Query Parameters

ParameterTypeDefaultDescription
audio_formatstringmp3mp3, linear16.
sample_rateinteger240008000, 16000, 22050, 24000, 32000, 44100.

Voice Settings

FieldTypeDefaultDescription
speedfloatPlayback speed multiplier.
volfloatVolume level.
pitchintegerPitch adjustment.
language_booststringLanguage emphasis for multilingual synthesis.
{
  "text": " ",
  "voice_settings": {
    "speed": 1.1,
    "vol": 1.0,
    "pitch": 0
  }
}

REST API

Fields

FieldTypeDefaultDescription
speedfloatPlayback speed multiplier.
volfloatVolume level.
pitchintegerPitch adjustment.
language_booststringLanguage emphasis.
output_typestringbinary_outputbinary_output, base64_output, or audio_id.

Response

Default (binary_output): chunked audio bytes. With output_type: "base64_output": JSON with base64-encoded audio. With output_type: "audio_id": JSON with an audio_url for deferred retrieval.