Skip to main content

Speech-to-Text with Voice API and TeXML

Introduction

In this tutorial, we will cover how to get a speech-to-text transcription of your calls using Voice API and TeXML. Before starting, please ensure your Voice API or TeXML application is correctly configured.

Video Tutorial

Learn how to implement real-time Speech-to-Text recognition in your voice applications:
This video shows how to capture and process spoken input from callers using Telnyx’s Speech-to-Text API.

Voice API

The transcription can be enabled for the Voice API calls using a dedicated endpoint in the following way:
Don’t forget to update YOUR_API_KEY here.
curl -i -X POST \
'https://api.telnyx.com/v2/calls/{call_control_id}/actions/transcription_start' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
    "language": "en",
    "client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
    "command_id": "891510ac-f3e4-11e8-af5b-de00688a4901",
    "transcription_engine":  "Google/Telnyx/Deepgram "
}'
Telnyx offers two different speech-to-text engines that can be used to process the audio from the call into a transcription:
  • Google (default) - Google speech-to-text engine that offers additional features like interim results.
  • Telnyx - In-house Telnyx speech-to-text engine with significantly better transcription accuracy and lower latency.
  • Deepgram - Deepgram speech-to-text engine with both models (nova-2 and nova-3) that can be set using transcription_model setting
The results are sent as a webhook delivered to the webhook defined for the Voice API application:
"data": {
   "record_type": "event",
   "event_type": "call.transcription",
   "id": "0ccc7b54-4df3-4bca-a65a-3da1ecc777f0",
   "occurred_at": "2018-02-02T22:25:27.521992Z",
   "payload": {
        "call_control_id":           "v2:7subYr8fLrXmaAXm8egeAMpoSJ72J3SGPUuome81-hQuaKRf9b7hKA",
        "call_leg_id": "5ca81340-5beb-11eb-ae45-02420a0f8b69",
        "call_session_id": "5ca81eee-5beb-11eb-ba6c-02420a0f8b69",
        "client_state": null,
        "connection_id": "1240401930086254526",
        "transcription_data": {
        "confidence": 0.977219,
            "is_final": true,
            "transcript": "hello this is a test speech"
         }
    }
}

TeXML

You can enable transcription on your TeXML calls by including a <Transcription> verb in the TeXML instructions:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Start>
    <Transcription language="en" transcriptionCallback="/transcription" transcriptionEngine=”Telnyx” />
  </Start>
</Response>
The transcription results are sent in the callback in the following format:
%{
    "AccountSid" : "6d547b4f-993a-4e87-b95c-2d9460b3824b",
    "CallSid" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
    "CallSidLegacy" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
    "Confidence" : "0.9822598695755005",
    "ConnectionId" : "1614262910593271041",
    "From" : "+18727726007",
    "IsFinal" : "true",
    "To" : "+48664087895",
    "Transcript" : "let's hear some music"
}