Skip to main content
Voice format: inworld.<Model>.<VoiceId> Models:
  • inworld-tts-1.5-mini (alias Mini) — faster, lower latency.
  • inworld-tts-1.5-max (alias Max) — higher quality.
  • inworld-tts-2 (alias TTS2) — latest generation; supports the delivery_mode parameter.
Defaults to inworld-tts-1.5-mini if model omitted.

Voice Samples

VoiceModelGenderSample
Inworld.Max.HankMaxMale
Inworld.Mini.LorettaMiniFemale

WebSocket

Query Parameters

ParameterTypeDefaultDescription
audio_formatstringmp3mp3, linear16.
sample_rateinteger240008000, 16000, 22050, 24000, 44100, 48000.
languagestringBCP-47 language code.

Voice Settings

FieldTypeDefaultDescription
encodingstringMP3MP3 or LINEAR16.
sample_rateinteger24000Output sample rate in Hz.
language_codestringBCP-47. Overrides language query param.
delivery_modestringSTABLE, BALANCED, or CREATIVE. Only supported by inworld-tts-2.
{
  "text": " ",
  "voice_settings": {
    "encoding": "LINEAR16",
    "sample_rate": 16000
  }
}

REST API

Fields

FieldTypeDefaultDescription
encodingstringMP3MP3 or LINEAR16.
sample_rateinteger24000Output sample rate in Hz.
language_codestringBCP-47 language code.
delivery_modestringSTABLE, BALANCED, or CREATIVE. Only supported by inworld-tts-2.
output_typestringbinary_outputbinary_output, base64_output, or audio_id.

Response

Default (binary_output): chunked audio bytes. With output_type: "base64_output": JSON with base64-encoded audio. With output_type: "audio_id": JSON with an audio_url for deferred retrieval.