In this tutorial, you'll learn about SSML tags that can help customize your audio response in your text-to-speech application.
Speech Synthesis Markup Language (SSML) is an XML-based markup language that is used to generate synthetic speech for appliations. SSML tags are used to change the tone of speech in the application by adjusting pitching, volume, duration of speech, and more.
There are 2 ways for defining the length of the pause by using the following attributes:
- Time: Defines the number of s or ms
- Strength: Chooses the strength using the following values:
- None: no pause
- Pause: the same duration as after a period
- x-weak: the same as none.
- weak: sets a pause of the same duration as the pause after a comma
- medium: has the same strength as weak
- strong: sets a pause of the same duration as the pause after a sentence
- x-strong: sets a pause of the same duration as the pause after a paragraph.
<speak> Mary had a little lamb <break time="3s"/>Whose fleece was white as snow. </speak>
The emphasising affects the speed and loudness of reading words and can be defined by using a 'level' attribute with one of the following values:
- Strong - increases the volume and slows the speaking rate
- Moderate - increases the volume and slows the speaking rate, but less than Strong
- Reduced - decreases the volume and speeds up the speaking rate
<speak> I already told you we're <emphasis level="strong">nearly</emphasis> there </speak>
The xml:lang tag defines the language for a specific word or sentence.
<speak> <xml:lang=”es”>Puedo hablar español</xml:lang=”es”> </speak>
This tag adds a pause between paragraphs that is longer than a regular pause at a comma or at the end of the sentence.
<speak> <p>This is the first paragraph.</p> <p>This is the second paragraph.</p> </speak>
The phonetic pronunciation requires 2 attributes:
- Alphabet , with the following options:
- ipa, meaning the International Phonetic Alphabet (IPA) will be used
- x-sampa, which indicates that the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) will be used.
- ph , specifies how the text should be pronounced.
<speak> Say <phoneme alphabet="ipa" ph="prəˌnʌnsɪˈeɪʃ(ə)n">pronunciation</phoneme>. </speak>
The following attributes can be used with the Prosody tag:
- Volume :
- default: resets the volume to default value
- silent, x-soft, soft, medium, loud, x-loud: sets the volume to predefined value
- +ndB, -ndB: changes the volume relative to the current level
- Rate :
- x-slow, slow, medium, fast,x-fast: sets the pitch to a predefined value
- n%: a percentage change in speaking pace.
<speak> Sometimes some words need to be said <prosody volume=”loud>louder</prosody> and sometimes a lower volume <prosody volume="-6dB">is a more effective way of interacting with your audience. </prosody> </speak>
This tag adds a pause between lines with the same effect as (.)
<speak> <s>Here we go round the mulberry bush</s> <s>On a cold and frosty morning</s> </speak>
The say-as tag uses one attribute,'interpret-as', which uses a number of possible available values:
- characters or spell-out
- cardinal or number
This tag should be used with the alias attribute to substitute a different word for selected text such as an acronym or abbreviation.
<speak> My favorite chemical element is <sub alias="Mercury">Hg</sub>, because it looks so shiny. </speak>