> ## Documentation Index
> Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Stream text to speech over WebSocket

> Open a WebSocket connection to stream text and receive synthesized audio in real time. Authentication is provided via the standard `Authorization: Bearer <API_KEY>` header. Send JSON frames with text to synthesize; receive JSON frames containing base64-encoded audio chunks.

Supported providers: `aws`, `telnyx`, `azure`, `murfai`, `minimax`, `rime`, `resemble`, `elevenlabs`, `xai`.

**Connection flow:**
1. Open WebSocket with query parameters specifying provider, voice, and model.
2. Send an initial handshake message `{"text": " "}` (single space) with optional `voice_settings` to initialize the session.
3. Send text messages as `{"text": "Hello world"}`.
4. Receive audio chunks as JSON frames with base64-encoded audio.
5. A final frame with `isFinal: true` indicates the end of audio for the current text.

To interrupt and restart synthesis mid-stream, send `{"force": true}` — the current worker is stopped and a new one is started.

**Note:** The Telnyx `Ultra` model is not available over WebSocket. Use the HTTP POST `/text-to-speech/speech` endpoint instead.


## OpenAPI

````yaml https://telnyx-openapi-ng.s3.us-east-1.amazonaws.com/text-to-speech/text-to-speech.yml get /text-to-speech
openapi: 3.1.0
info:
  title: Text to Speech API
  version: 2.0.0
  description: API for managing Text to Speech.
  contact:
    email: support@telnyx.com
servers:
  - url: https://api.telnyx.com/v2
security:
  - bearerAuth: []
tags:
  - name: Text to Speech
    description: Text to Speech operations
paths:
  /text-to-speech:
    get:
      tags:
        - Text To Speech Commands
      summary: Stream text to speech over WebSocket
      description: >-
        Open a WebSocket connection to stream text and receive synthesized audio
        in real time. Authentication is provided via the standard
        `Authorization: Bearer <API_KEY>` header. Send JSON frames with text to
        synthesize; receive JSON frames containing base64-encoded audio chunks.


        Supported providers: `aws`, `telnyx`, `azure`, `murfai`, `minimax`,
        `rime`, `resemble`, `elevenlabs`, `xai`.


        **Connection flow:**

        1. Open WebSocket with query parameters specifying provider, voice, and
        model.

        2. Send an initial handshake message `{"text": " "}` (single space) with
        optional `voice_settings` to initialize the session.

        3. Send text messages as `{"text": "Hello world"}`.

        4. Receive audio chunks as JSON frames with base64-encoded audio.

        5. A final frame with `isFinal: true` indicates the end of audio for the
        current text.


        To interrupt and restart synthesis mid-stream, send `{"force": true}` —
        the current worker is stopped and a new one is started.


        **Note:** The Telnyx `Ultra` model is not available over WebSocket. Use
        the HTTP POST `/text-to-speech/speech` endpoint instead.
      operationId: TextToSpeechOverWs
      parameters:
        - $ref: '#/components/parameters/voice'
        - $ref: '#/components/parameters/provider'
        - $ref: '#/components/parameters/model_id'
        - $ref: '#/components/parameters/voice_id'
        - $ref: '#/components/parameters/disable_cache'
        - $ref: '#/components/parameters/audio_format'
        - $ref: '#/components/parameters/socket_id'
      responses:
        '101':
          description: >-
            WebSocket connection established. Communication proceeds via JSON
            frames.


            **Client → Server:** See `ClientTextFrame` schema.

            **Server → Client:** See `AudioChunkFrame`, `FinalFrame`, and
            `ErrorFrame` schemas.
          content:
            application/json:
              schema:
                oneOf:
                  - $ref: '#/components/schemas/ClientTextFrame'
                  - $ref: '#/components/schemas/TtsServerEvent'
        '200':
          description: >-
            WebSocket upgrade successful — this response is not returned
            directly. See 101 for frame documentation.
        '400':
          description: >-
            Invalid parameters — provider not supported or missing required
            fields.
        '401':
          description: >-
            Authentication failed — missing or invalid `x-telnyx-auth-rev2`
            header.
components:
  parameters:
    voice:
      name: voice
      in: query
      required: false
      description: >-
        Voice identifier in the format `provider.model_id.voice_id` or
        `provider.voice_id` (e.g. `telnyx.NaturalHD.Telnyx_Alloy`,
        `Telnyx.Ultra.<voice_id>`, or `azure.en-US-AvaMultilingualNeural`). When
        provided, the `provider`, `model_id`, and `voice_id` are extracted
        automatically. Takes precedence over individual
        `provider`/`model_id`/`voice_id` parameters.
      schema:
        type: string
    provider:
      name: provider
      in: query
      required: false
      description: >-
        TTS provider. Defaults to `telnyx` if not specified. Ignored when
        `voice` is provided.
      schema:
        type: string
        enum:
          - aws
          - telnyx
          - azure
          - elevenlabs
          - minimax
          - murfai
          - rime
          - resemble
          - xai
        default: telnyx
    model_id:
      name: model_id
      in: query
      required: false
      description: >-
        Model identifier for the chosen provider. Examples: `Natural`,
        `NaturalHD`, `Ultra` (Telnyx); `Polly.Generative` (AWS).
      schema:
        type: string
    voice_id:
      name: voice_id
      in: query
      required: false
      description: Voice identifier for the chosen provider.
      schema:
        type: string
    disable_cache:
      name: disable_cache
      in: query
      required: false
      description: When `true`, bypass the audio cache and generate fresh audio.
      schema:
        type: boolean
        default: false
    audio_format:
      name: audio_format
      in: query
      required: false
      description: >-
        Audio output format override. Supported for Telnyx models. `pcm` and
        `wav` are available for `Natural`/`NaturalHD` models. The `Ultra` model
        outputs PCM at 24kHz s16le or MP3 at 128kbps 24kHz.
      schema:
        type: string
        enum:
          - pcm
          - wav
          - mp3
    socket_id:
      name: socket_id
      in: query
      required: false
      description: >-
        Client-provided socket identifier for tracking. If not provided, one is
        generated server-side.
      schema:
        type: string
  schemas:
    ClientTextFrame:
      type: object
      description: Client-to-server frame containing text to synthesize.
      required:
        - text
      properties:
        text:
          type: string
          description: >-
            Text to convert to speech. Send `" "` (single space) as an initial
            handshake with optional `voice_settings`. Subsequent messages
            contain the actual text to synthesize.
        voice_settings:
          type: object
          description: >-
            Provider-specific voice settings sent with the initial handshake.
            Contents vary by provider — e.g. `{"speed": 1.2}` for Minimax,
            `{"voice_speed": 1.5}` for Telnyx.
          additionalProperties: true
        force:
          type: boolean
          description: >-
            When `true`, stops the current synthesis worker and starts a new
            one. Used to interrupt speech mid-stream and begin synthesizing new
            text.
      examples:
        - text: ' '
          voice_settings:
            voice_speed: 1.2
        - text: Hello, welcome to Telnyx text to speech.
        - text: New text after interruption.
          force: true
    TtsServerEvent:
      description: Union of all server-to-client WebSocket events for TTS streaming.
      oneOf:
        - $ref: '#/components/schemas/AudioChunkFrame'
        - $ref: '#/components/schemas/FinalFrame'
        - $ref: '#/components/schemas/ErrorFrame'
      discriminator:
        propertyName: type
        mapping:
          audio_chunk:
            $ref: '#/components/schemas/AudioChunkFrame'
          final:
            $ref: '#/components/schemas/FinalFrame'
          error:
            $ref: '#/components/schemas/ErrorFrame'
    AudioChunkFrame:
      type: object
      description: Server-to-client frame containing a base64-encoded audio chunk.
      properties:
        type:
          type: string
          const: audio_chunk
          description: Frame type identifier.
        audio:
          type:
            - string
            - 'null'
          description: >-
            Base64-encoded audio data. May be `null` for providers that use
            `drop_concatenated_audio` mode (Telnyx Natural/NaturalHD, Rime,
            Minimax, MurfAI, Resemble) — in that case only streamed chunks carry
            audio.
        text:
          type:
            - string
            - 'null'
          description: The text segment that this audio chunk corresponds to.
        isFinal:
          type: boolean
          description: Always `false` for audio chunk frames.
        cached:
          type: boolean
          description: Whether this audio was served from cache.
        timeToFirstAudioFrameMs:
          type: integer
          description: >-
            Milliseconds from the start-of-speech request to the first audio
            frame. Only present on the first audio chunk of a synthesis request.
    FinalFrame:
      type: object
      description: >-
        Server-to-client frame indicating synthesis is complete for the current
        text.
      properties:
        type:
          type: string
          const: final
          description: Frame type identifier.
        audio:
          type: 'null'
          description: Always `null` for the final frame.
        text:
          type: string
          description: Empty string.
        isFinal:
          type: boolean
          description: Always `true`.
          const: true
        timeToFirstAudioFrameMs:
          type: integer
          description: Present if this was the first response frame.
    ErrorFrame:
      type: object
      description: >-
        Server-to-client frame indicating an error during synthesis. The
        connection is closed shortly after.
      properties:
        type:
          type: string
          const: error
          description: Frame type identifier.
        error:
          type: string
          description: Error message describing what went wrong.
  securitySchemes:
    bearerAuth:
      scheme: bearer
      type: http

````