Endpoint
Authentication
Request
Response Modes
Theoutput_type parameter controls how audio is returned:
| Value | Response |
|---|---|
binary_output (default) | Raw audio bytes. Content-Type header indicates format (e.g., audio/mpeg). |
base64_output | JSON body with base64-encoded audio. |
audio_id | Returns an audio_id for deferred retrieval via GET /v2/text-to-speech/speech/:audio_id. |
Binary (default)
Base64
Text Preprocessing
Before synthesis, text is automatically preprocessed:- Markdown stripping — headers, bold, italics, code blocks, links, lists, emoji → plain text
- Pronunciation dictionary — custom word replacements applied if
pronunciation_dict_idis set