Skip to main content
Set via the input_format query parameter. Audio is sent as binary WebSocket frames — chunked bytes, no base64, no JSON wrapping. Container formats (mp3, webm, etc.) are self-describing: the server demuxes the byte stream and extracts encoding/sample rate from headers. Raw formats have no metadata, so you must set sample_rate explicitly. Works for both real-time capture (microphone, MediaRecorder, telephony bridge) and file streaming (read a file in chunks, push through the socket).

Browser Capture

Output from MediaRecorder or similar browser APIs. Container headers carry sample rate.
wss://api.telnyx.com/v2/speech-to-text/transcription?input_format=webm_opus
FormatSample rateNotes
webmfrom headerWebM container
webm_opusfrom headerWebM + Opus. Valid: 8000–48000. Alias: webm-opus
ogg_opusfrom headerOgg + Opus. Valid: 8000–48000. Alias: ogg-opus
oggfrom headerOgg container (Vorbis or other)

Telephony

Codecs from voice networks. Raw frames, sample_rate required.
wss://api.telnyx.com/v2/speech-to-text/transcription?input_format=mulaw&sample_rate=8000
FormatSample rateNotes
mulawanyG.711 µ-law. North America. Default: 8000 Hz.
alawanyG.711 A-law. EU/international. Default: 8000 Hz.
g7298000G.729. Fixed.
amr_nb8000AMR narrowband. Fixed. Alias: amr-nb
amr_wb16000AMR wideband. Fixed. Alias: amr-wb
speex8000, 16000, 32000Google: 16000 only.
Invalid sample rate returns error 40005.

Raw PCM

Uncompressed audio from microphones, processing pipelines, or SDKs. sample_rate required.
wss://api.telnyx.com/v2/speech-to-text/transcription?input_format=linear16&sample_rate=16000
FormatSample rateNotes
linear16any16-bit signed PCM, little-endian (s16le). Default: 16000 Hz.
linear32any32-bit float PCM, little-endian (f32le). Default: 16000 Hz.
opus8000, 12000, 16000, 24000, 48000Raw Opus frames, no container. Deepgram also: 44100.
Invalid sample rate returns error 40005.

Recorded File

Pre-recorded files read in chunks and streamed through the socket. Container headers carry sample rate.
wss://api.telnyx.com/v2/speech-to-text/transcription?input_format=mp3
FormatSample rateNotes
mp3from headerDefault for most engines
wavfrom headerUncompressed. Default for Flux model.
flacfrom headerLossless compression

Engine Compatibility

Unsupported format/engine combination returns error 40002. Unsupported Flux format returns error 40006. Deepgram has three model generations with different format support. Flux is the most restrictive — it drops mp3, flac, webm_opus, amr_nb, amr_wb, g729, and speex compared to Nova.
FormatDeepgram NovaDeepgram FluxTelnyxGoogleAzure
mp3
wav
webm
ogg
flac
ogg_opus
webm_opus
linear16
linear32
mulaw
alaw
opus
amr_nb
amr_wb
g729
speex
Universal formats (all engines and models): wav, linear16.