> ## Documentation Index
> Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Models & Engines

> Compare Telnyx Speech-to-Text engines and models — Deepgram, Whisper, Google, Azure, xAI, AssemblyAI, Speechmatics, Soniox — by accuracy, latency, language coverage, and price.

## Comparison

| Engine           | Model (WebSocket)                | Model (REST)                    | Latency    | Languages                                                                                                        | Best for                                                                                                   |
| ---------------- | -------------------------------- | ------------------------------- | ---------- | ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| **Deepgram**     | `nova-3`                         | `deepgram/nova-3`               | Low        | 40+ ([reference](https://developers.deepgram.com/docs/models-languages-overview))                                | **Recommended.** Highest English accuracy, diarization, word timestamps                                    |
| **Deepgram**     | `nova-2`                         | `deepgram/nova-2`               | Low        | 40+                                                                                                              | Legacy — use nova-3 unless you have a specific reason                                                      |
| **Deepgram**     | `flux`                           | —                               | **Lowest** | 10 languages                                                                                                     | Voice agents — built-in end-of-turn detection (WebSocket only)                                             |
| **Telnyx**       | `openai/whisper-large-v3-turbo`  | `openai/whisper-large-v3-turbo` | Medium     | 50+ ([reference](https://github.com/openai/whisper#available-models-and-languages))                              | Multilingual transcription                                                                                 |
| **Telnyx**       | `openai/whisper-tiny`            | `openai/whisper-tiny`           | Low        | 50+                                                                                                              | Lightweight, on-network                                                                                    |
| **Google**       | `latest_long`                    | —                               | Medium     | 125+ ([reference](https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages))              | Long-form multilingual audio (WebSocket only)                                                              |
| **Azure**        | `azure/fast`                     | —                               | Medium     | 100+ ([reference](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt)) | Broad language and accent coverage (WebSocket only)                                                        |
| **xAI**          | `xai/grok-stt`                   | —                               | Low        | 25 languages                                                                                                     | Grok STT for real-time transcription (WebSocket and Voice API only)                                        |
| **AssemblyAI**   | `assemblyai/universal-streaming` | —                               | Low        | 6 languages                                                                                                      | Universal-Streaming for voice agents with low latency and turn detection (WebSocket and Voice API only)    |
| **Speechmatics** | `speechmatics/standard`          | —                               | Low        | 17+ languages                                                                                                    | High-accuracy real-time transcription with bilingual and multilingual packs (WebSocket and Voice API only) |
| **Soniox**       | `soniox/stt-rt-v4`               | —                               | Low        | Auto-detect                                                                                                      | Real-time transcription with interim results and endpointing (WebSocket and Voice API only)                |

## Engine Details

<Tabs>
  <Tab title="Deepgram">
    The default WebSocket engine. Best English accuracy and the richest feature set. For REST, you must explicitly set `model="deepgram/nova-3"` — the REST default is `openai/whisper-large-v3-turbo`.

    **Models:**

    * **`nova-3`** — Latest and most accurate. Supports diarization, word-level timestamps, smart formatting, numerals, and punctuation via [`model_config`](/docs/voice/stt/rest-api/parameters/model-config). Use this unless you need the lowest possible latency.
    * **`nova-2`** — Previous generation. Still supported but nova-3 is better in all benchmarks.
    * **`flux`** — Purpose-built for voice agents. Lowest latency with built-in [end-of-turn detection](/docs/voice/stt/websocket-streaming/parameters/end-of-turn) — tells you when the speaker has finished so your agent can respond. WebSocket only.

    **Languages:** 40+ languages across Deepgram models. Nova-3 supports `multi` mode (10 languages with code-switching). Flux supports English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. See [Deepgram languages](https://developers.deepgram.com/docs/models-languages-overview).
  </Tab>

  <Tab title="Telnyx">
    Telnyx runs Whisper models on-network.

    **Models:**

    * **`openai/whisper-large-v3-turbo`** — Multilingual (50+ languages, auto-detected). Returns text only — no timestamps regardless of response format.
    * **`openai/whisper-tiny`** — Lightweight, lowest resource usage.

    **Languages:** 50+ languages, auto-detected. Use `auto_detect` to skip the language hint. See the [Whisper language list](https://github.com/openai/whisper#available-models-and-languages).

    **Limitations:** No diarization. No word-level timestamps.
  </Tab>

  <Tab title="Google">
    Google Cloud Speech-to-Text integration.

    **Model:** `latest_long`

    **Languages:** 125+ languages/locales. See [Google Cloud STT languages](https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages).
  </Tab>

  <Tab title="Azure">
    Microsoft Azure Speech Services integration.

    **Model:** `azure/fast`

    **Languages:** 100+ languages/locales with strong accent and dialect coverage. See [Azure Speech languages](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt).
  </Tab>

  <Tab title="xAI">
    xAI Grok STT integration for real-time transcription.

    **Model:** `xai/grok-stt`

    **Languages:** 25 languages, including Arabic, English, French, German, Hindi, Japanese, Korean, Portuguese, Spanish, and Vietnamese.
  </Tab>

  <Tab title="AssemblyAI">
    AssemblyAI Universal-Streaming integration for real-time voice agent transcription.

    **Model:** `assemblyai/universal-streaming`

    **Languages:** English, Spanish, German, French, Portuguese, and Italian.
  </Tab>

  <Tab title="Speechmatics">
    Speechmatics real-time transcription integration with high accuracy and multilingual support including bilingual packs.

    **Model:** `speechmatics/standard`

    **Languages:** English, Spanish, plus bilingual/multilingual packs including Arabic–English, Mandarin–English, English–Malay, English–Tamil, Tagalog, and Spanish–English bilingual. Also supports Basque, Galician, Irish, Maltese, Mongolian, Swahili, Uyghur, and Welsh.

    **Features:** Supports interim results (partial transcripts) and graceful `CloseStream` shutdown.
  </Tab>

  <Tab title="Soniox">
    Soniox real-time transcription integration with automatic language detection.

    **Model:** `soniox/stt-rt-v4`

    **Languages:** Automatic detection — no language hint required.

    **Features:** Supports interim results (partial transcripts), endpointing, and graceful `CloseStream` shutdown.
  </Tab>
</Tabs>

## How to Choose

**Need the highest accuracy for English?**
→ Deepgram `nova-3` — best WER (word error rate) across all English variants.

**Building a voice agent that needs to know when the user stopped talking?**
→ Deepgram `flux` — lowest latency with built-in end-of-turn detection.

**Need to transcribe files in 50+ languages?**
→ Telnyx `openai/whisper-large-v3-turbo` via REST API.

**Need diarization (who said what)?**
→ Deepgram `nova-3` with `model_config.diarize: true`.

**Need broad accent/dialect support?**
→ Azure `azure/fast` — strong coverage across regional accents.

**Need Grok STT for real-time calls?**
→ xAI `xai/grok-stt` via WebSocket or Voice API.

**Need low-latency streaming for voice agents?**
→ AssemblyAI `assemblyai/universal-streaming` via WebSocket or Voice API.

**Need high-accuracy multilingual with bilingual packs?**
→ Speechmatics `speechmatics/standard` via WebSocket or Voice API.

**Need real-time transcription with automatic language detection?**
→ Soniox `soniox/stt-rt-v4` via WebSocket or Voice API.

## Specifying the Engine and Model

**WebSocket** — set via query parameters:

```
wss://api.telnyx.com/v2/speech-to-text/transcription?transcription_engine=Deepgram&model=nova-3
```

**REST API** — set via the `model` body parameter:

```bash theme={null}
curl -X POST https://api.telnyx.com/v2/ai/audio/transcriptions \
  -H "Authorization: Bearer YOUR_TELNYX_API_KEY" \
  -F model="deepgram/nova-3" \
  -F file=@audio.mp3
```
