Speech-to-Text with Voice API and TeXML
Introduction
In this tutorial, we will cover how to get a speech-to-text transcription of your calls using Voice API and TeXML.
Before starting, please ensure your Voice API or TeXML application is correctly configured.
Video Tutorial
Learn how to implement real-time Speech-to-Text recognition in your voice applications:
This video shows how to capture and process spoken input from callers using Telnyx's Speech-to-Text API.
Voice API
The transcription can be enabled for the Voice API calls using a dedicated endpoint in the following way:
NoteDon't forget to update
YOUR_API_KEY
here.
curl -i -X POST \
'https://api.telnyx.com/v2/calls/{call_control_id}/actions/transcription_start' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"language": "en",
"client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
"command_id": "891510ac-f3e4-11e8-af5b-de00688a4901",
"transcription_engine": "A/B "
}'
Telnyx offers two different speech-to-text engines that can be used to process the audio from the call into a transcription:
- A (default) - Google speech-to-text engine that offers additional features like interim results.
- B - In-house Telnyx speech-to-text engine with significantly better transcription accuracy and lower latency.
The results are sent as a webhook delivered to the webhook defined for the Voice API application:
"data": {
"record_type": "event",
"event_type": "call.transcription",
"id": "0ccc7b54-4df3-4bca-a65a-3da1ecc777f0",
"occurred_at": "2018-02-02T22:25:27.521992Z",
"payload": {
"call_control_id": "v2:7subYr8fLrXmaAXm8egeAMpoSJ72J3SGPUuome81-hQuaKRf9b7hKA",
"call_leg_id": "5ca81340-5beb-11eb-ae45-02420a0f8b69",
"call_session_id": "5ca81eee-5beb-11eb-ba6c-02420a0f8b69",
"client_state": null,
"connection_id": "1240401930086254526",
"transcription_data": {
"confidence": 0.977219,
"is_final": true,
"transcript": "hello this is a test speech"
}
}
}
TeXML
You can enable transcription on your TeXML calls by including a <Transcription>
verb in the TeXML instructions:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Transcription language="en" transcriptionCallback="/transcription" transcriptionEngine=”B” />
</Start>
</Response>
The transcription results are sent in the callback in the following format:
%{
"AccountSid" : "6d547b4f-993a-4e87-b95c-2d9460b3824b",
"CallSid" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
"CallSidLegacy" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
"Confidence" : "0.9822598695755005",
"ConnectionId" : "1614262910593271041",
"From" : "+18727726007",
"IsFinal" : "true",
"To" : "+48664087895",
"Transcript" : "let's hear some music"
}