Skip to main content
Voice format: azure.<VoiceId> Example: azure.en-US-AvaMultilingualNeural No model ID segment — Azure voices are flat identifiers. Default voice: en-US-AvaMultilingualNeural.

Voice Samples

VoiceLanguageGenderSample
azure.en-US-AvaMultilingualNeuralen-USFemale
azure.en-US-AndrewMultilingualNeuralen-USMale

WebSocket

Query Parameters

ParameterTypeDefaultDescription
audio_formatstringmp3mp3, wav, linear16, mulaw, alaw.
sample_rateinteger240008000, 16000, 24000, 48000.
languagestringen-USBCP-47 language code.
text_typestringtexttext or ssml. Azure supports SSML for pronunciation and prosody control.

Voice Settings

FieldTypeDefaultDescription
output_formatstringaudio-24khz-160kbitrate-mono-mp3See Azure audio formats.
language_codestringen-USBCP-47. Overrides language query param.
text_typestringtexttext or ssml. Overrides query param.
effectstringeq_car, eq_telecomhp8k. Audio equalization.
genderstringMale, Female. Voice gender filter.
{
  "text": " ",
  "voice_settings": {
    "output_format": "audio-48khz-192kbitrate-mono-mp3",
    "effect": "eq_car"
  }
}

REST API

Fields

FieldTypeDefaultDescription
output_formatstringaudio-24khz-160kbitrate-mono-mp3Azure audio format string.
language_codestringen-USBCP-47 language code.
text_typestringtexttext or ssml.
effectstringeq_car, eq_telecomhp8k.
genderstringMale, Female.
output_typestringbinary_outputbinary_output, base64_output, or audio_id.

Response

Default (binary_output): chunked audio bytes. With output_type: "base64_output": JSON with base64-encoded audio. With output_type: "audio_id": JSON with an audio_url for deferred retrieval.