Output Type
output_type controls how audio is returned in the HTTP response:
| Value | Response |
|---|---|
binary_output (default) | Raw audio bytes. Content-Type header set to the audio MIME type (e.g., audio/mpeg). |
base64_output | JSON body: {"base64_audio": "<base64>"} |
audio_id | JSON body with an audio_id for later retrieval via GET /v2/text-to-speech/speech/:audio_id |
Common Format Vocabulary
| Format | Description |
|---|---|
mp3 | MPEG Layer 3 |
wav | WAV container (PCM) |
linear16 | Raw 16-bit PCM |
mulaw | μ-law encoded |
alaw | A-law encoded |
ogg_vorbis | OGG Vorbis |
pcm | Alias for linear16 (backward compat) |
Provider Format Support Matrix
| Provider | Supported formats | sample_rate |
|---|---|---|
telnyx | mp3, linear16 | pass-through |
aws | mp3, linear16, ogg_vorbis | pass-through |
azure | mp3, wav, linear16, mulaw, alaw | 8000 / 16000 / 24000 / 48000 |
elevenlabs | mp3, linear16, mulaw | codec-specific |
rime | mp3, linear16 | pass-through |
minimax | mp3, linear16 | 8000 / 16000 / 22050 / 24000 / 32000 / 44100 |
resemble | mp3, wav | pass-through |
inworld | mp3, linear16 | pass-through |
qwen | mp3, linear16 | N/A (fixed 24kHz) |