Speech-to-Text with Voice API and TeXML

Introduction

In this tutorial, we will cover how to get a speech-to-text transcription of your calls using Voice API and TeXML.

Before starting, please ensure your Voice API or TeXML application is correctly configured.

Voice API

The transcription can be enabled for the Voice API calls using a dedicated endpoint in the following way:

Note
Don't forget to update YOUR_API_KEY here.

curl -i -X POST \
'https://api.telnyx.com/v2/calls/{call_control_id}/actions/transcription_start' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
    "language": "en",
    "client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
    "command_id": "891510ac-f3e4-11e8-af5b-de00688a4901"
    "transcription_engine" =  "A/B "
}'

Telnyx offers two different speech-to-text engines that can be used to process the audio from the call into a transcription:

A (default) - Google speech-to-text engine that offers additional features like interim results.
B - In-house Telnyx speech-to-text engine with significantly better transcription accuracy and lower latency.

The results are sent as a webhook delivered to the webhook defined for the Voice API application:

"data": {
   "record_type": "event",
   "event_type": "call.transcription",
   "id": "0ccc7b54-4df3-4bca-a65a-3da1ecc777f0",
   "occurred_at": "2018-02-02T22:25:27.521992Z",
   "payload": {
        "call_control_id":           "v2:7subYr8fLrXmaAXm8egeAMpoSJ72J3SGPUuome81-hQuaKRf9b7hKA",
        "call_leg_id": "5ca81340-5beb-11eb-ae45-02420a0f8b69",
        "call_session_id": "5ca81eee-5beb-11eb-ba6c-02420a0f8b69",
        "client_state": null,
        "connection_id": "1240401930086254526",
        "transcription_data": {
        "confidence": 0.977219,
            "is_final": true,
            "transcript": "hello this is a test speech"
         }
    }
}

TeXML

You can enable transcription on your TeXML calls by including a <Transcription> verb in the TeXML instructions:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Start>
    <Transcription language="en" transcriptionCallback="/transcription" transcriptionEngine=”B” />
  </Start>
</Response>

The transcription results are sent in the callback in the following format:

%{
    "AccountSid" : "6d547b4f-993a-4e87-b95c-2d9460b3824b",
    "CallSid" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
    "CallSidLegacy" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
    "Confidence" : "0.9822598695755005",
    "ConnectionId" : "1614262910593271041",
    "From" : "+18727726007",
    "IsFinal" : "true",
    "To" : "+48664087895",
    "Transcript" : "let's hear some music"
}

Speech-to-Text with Voice API and TeXML

Introduction ​

Voice API ​

TeXML ​

Introduction

Voice API

TeXML