Skip to main content

Speech-to-Text with Voice API and TeXML

Introduction

In this tutorial, we will cover how to get a speech-to-text transcription of your calls using Voice API and TeXML. Before starting, please ensure your Voice API or TeXML application is correctly configured.

Voice API

The transcription can be enabled for the Voice API calls using a dedicated endpoint in the following way:

curl -i -X POST \
'https://api.telnyx.com/v2/calls/{call_control_id}/actions/transcription_start' \
-H 'Authorization: Bearer <YOUR_TOKEN_HERE>' \
-H 'Content-Type: application/json' \
-d '{
"language": "en",
"client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
"command_id": "891510ac-f3e4-11e8-af5b-de00688a4901"
"transcription_engine" = "A/B "
}'

Telnyx offers two different speech-to-text engines that can be used to process the audio from the call into a transcription:

  • A (default) - Google speech-to-text engine that offers additional features like interim results.
  • B - In-house Telnyx speech-to-text engine with significantly better transcription accuracy and lower latency.

The results are sent as a webhook delivered to the webhook defined for the Voice API application:

"data": {
"record_type": "event",
"event_type": "call.transcription",
"id": "0ccc7b54-4df3-4bca-a65a-3da1ecc777f0",
"occurred_at": "2018-02-02T22:25:27.521992Z",
"payload": {
"call_control_id": "v2:7subYr8fLrXmaAXm8egeAMpoSJ72J3SGPUuome81-hQuaKRf9b7hKA",
"call_leg_id": "5ca81340-5beb-11eb-ae45-02420a0f8b69",
"call_session_id": "5ca81eee-5beb-11eb-ba6c-02420a0f8b69",
"client_state": null,
"connection_id": "1240401930086254526",
"transcription_data": {
"confidence": 0.977219,
"is_final": true,
"transcript": "hello this is a test speech"
}
}
}

TeXML

You can enable transcription on your TeXML calls by including a <Transcription> verb in the TeXML instructions:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Transcription language="en" transcriptionCallback="/transcription" transcriptionEngine=”B” />
</Start>
</Response>

The transcription results are sent in the callback in the following format:

%{
"AccountSid" : "6d547b4f-993a-4e87-b95c-2d9460b3824b",
"CallSid" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
"CallSidLegacy" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
"Confidence" : "0.9822598695755005",
"ConnectionId" : "1614262910593271041",
"From" : "+18727726007",
"IsFinal" : "true",
"To" : "+48664087895",
"Transcript" : "let's hear some music"
}

On this page