Skip to main content

Required

text
string
required
Text to synthesize. Plain text or SSML (when text_type is "ssml" and supported by the provider).
voice
string
required
Voice identifier in Provider.Model.VoiceId format. Examples: Telnyx.Ultra.aura, Telnyx.NaturalHD.astra, aws.Polly.Neural.Joanna.

Output

output_type
string
Response format. Default: binary_output.
  • binary_output — raw audio bytes with Content-Type header
  • base64_output — JSON with base64_audio field
  • audio_id — returns an ID for async retrieval via GET /v2/text-to-speech/speech/:audio_id

Content

text_type
string
Input text format: text (default) or ssml. SSML supported by AWS Polly and Azure.
language
string
Language code (ISO 639-1 or full language name). Used by providers that support language selection.

Voice Settings

voice_settings
object
Provider-specific parameters. See Voice Settings.

Pronunciation

pronunciation_dict_id
string
ID of a pronunciation dictionary to apply custom word replacements before synthesis.

Caching

disable_cache
boolean
When true, bypass audio cache and synthesize fresh. Default: false.