Open a WebSocket connection to stream audio and receive transcriptions in real-time. Authentication is provided via the standard Authorization: Bearer <API_KEY> header.
Supported engines: Azure, Deepgram, Google, Telnyx.
Connection flow:
transcript, is_final, and confidence fields.Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
The transcription engine to use for processing the audio stream.
Azure, Deepgram, Google, Telnyx The format of input audio stream.
mp3, wav The language spoken in the audio stream.
Whether to receive interim transcription results.
The specific model to use within the selected transcription engine.
fast Client sends binary audio frames (mp3 or wav format) over the WebSocket. See SttClientEvent schema.
Binary audio data in mp3 or wav format.
WebSocket connection established. Communication proceeds via binary audio frames (client) and JSON transcript frames (server).
Client → Server: Binary audio data (mp3/wav).
Server → Client: See TranscriptFrame and SttErrorFrame schemas.
Union of all server-to-client WebSocket events for STT streaming.
Frame type identifier.
"transcript"The transcribed text from the audio.
Whether this is a final transcription result. When false, this is an interim result that may be refined.
Confidence score of the transcription, ranging from 0 to 1.