Skip to main content
Structure your prompt for consistent results:
<Gender>, <Age range>. <Quality/energy description>.
<1–2 sentences about timbre, pacing, delivery>
Example:
Female, mid-thirties. Warm and full, slightly husky. Moderate pace, sounds like someone who smiles while talking.

Dimensions to describe

Age

DescriptorWhat it produces
”young adult”, “in their 20s”Lighter, more energetic
”mid-thirties”, “early forties”Balanced, mature
”elderly”, “in his 80s”Deeper, weathered texture

Tone / Timbre

  • Deep / low-pitched — gravitas, authority
  • Smooth / rich — polished, professional
  • Gravelly / raspy — character, authenticity
  • Airy / breathy — intimate, soft
  • Warm / mellow — approachable, friendly

Gender

Male, female, or describe the sound directly: “a lower-pitched, husky female voice” or “a neutral, mid-pitched androgynous voice.”

Pacing

  • Measured / deliberate — careful, authoritative
  • Rapid-fire / quick — energetic, urgent
  • Relaxed / conversational — natural, approachable
  • Rhythmic — storytelling, narration

Emotion / Energy

  • Calm / serene — support, meditation
  • Enthusiastic / upbeat — marketing, announcements
  • Authoritative / matter-of-fact — IVR, instructions
  • Warm / empathetic — customer service, healthcare

Accent / Regional

Describe the regional quality you want. Be specific:
  • “Slight British accent” rather than “British”
  • “Neutral American” rather than just “American”
  • “Soft Southern drawl” rather than “Southern”

Use case context

Adding context helps the model understand intent:
  • “Customer service agent for a bank”
  • “Podcast narrator for true crime”
  • “Bedtime story reader for children”

Example prompts

Use CasePromptRecommended Engine
Customer serviceFemale, mid-thirties. Warm and full, slightly husky. Moderate pace, sounds like someone who smiles while talking.Minimax
IVR systemMale, late thirties. Clean and dry, matter-of-fact. Deliberate pace, pauses before numbers and details.Telnyx
Voice agentFemale, late twenties. Clear and professional, slightly upbeat. Natural conversational pace with a helpful tone.
Podcast narratorMale, early forties. Deep and smooth, with a rich baritone. Measured pacing, storytelling cadence.Minimax
Empathetic supportMale, mid-thirties. Warm, slightly gravelly. Measured and unhurried. You can hear patience in the breathing rhythm.Telnyx
Notification/alertFemale, mid-twenties. Bright and crisp. Quick pace, clear enunciation. No emotion — just information.Minimax
Meditation guideFemale, mid-forties. Soft, airy, and serene. Extremely slow and deliberate pace. Soothing and deeply relaxing delivery.Minimax
Energetic promoMale, early twenties. Bright and enthusiastic, high energy. Rapid-fire pacing, sounds highly engaged and convincing.Minimax
Audiobook (Fiction)Male, in his 60s. Deep, weathered texture. Relaxed, storytelling cadence with a warm, nostalgic feel.Telnyx

Common pitfalls

  • Too vague — “nice voice” or “good voice” produces generic output. Be specific about at least 3 dimensions.
  • Contradictory traits — “whisper” + “booming” confuses the model. Pick a coherent set of characteristics.
  • Provider differences — the same prompt may produce noticeably different results on Telnyx vs Minimax. Try both.
  • Ignoring the preview text — the text you provide for synthesis should match the voice’s intended use. Don’t use a cheerful script for a somber voice.

The Enhance button

The portal’s Enhance button uses AI to expand a short description into a detailed prompt.
Before (Short description)After (Enhanced prompt)
Empathetic tech support agentEmpathetic tech support agent

Gender and age: Female, late 20s to early 30s.
Where the voice sits: Head and chest, with a balanced resonance.
Texture: Silky smooth with a faint warmth, slightly airy.
Pace: Moderate, with deliberate pauses for clarity and reassurance.
Distinctive quality: A gentle, patient lilt that conveys calm and understanding.
Persuasive outbound sales callerPersuasive outbound sales caller

Gender and age: Female, 28-34.
Where the voice sits: Throat with chest undertones.
Texture: Smooth and polished, like warm honey over gravel.
Pace: Brisk and rhythmic, with confident pauses for emphasis.
Distinctive quality: A bright, engaging lilt that conveys enthusiasm without sounding forced.
Professional medical clinic receptionistProfessional medical clinic receptionist

Gender and age: Female, 28-34.
Where the voice sits: Chest and throat, grounded and clear.
Texture: Smooth, slightly warm, with a subtle firmness like pressed cotton.
Pace: Measured and steady, with deliberate pauses for clarity.
Distinctive quality: A calm, reassuring tone, as if accustomed to offering comfort in stressful moments.
Patient language tutorPatient language tutor

Gender and age: Female, late 20s to early 30s.
Where the voice sits: Head and chest, balanced resonance.
Texture: Smooth, warm, and gently textured like soft velvet.
Pace: Measured and deliberate, with thoughtful pauses and clear enunciation.
Distinctive quality: A calm, encouraging lilt that feels reassuring and attentive.
This is a good starting point, but review the expanded prompt before generating — you may want to tweak specific dimensions.