Recommended format
Structure your prompt for consistent results:Female, mid-thirties. Warm and full, slightly husky. Moderate pace, sounds like someone who smiles while talking.
Dimensions to describe
Age
| Descriptor | What it produces |
|---|---|
| ”young adult”, “in their 20s” | Lighter, more energetic |
| ”mid-thirties”, “early forties” | Balanced, mature |
| ”elderly”, “in his 80s” | Deeper, weathered texture |
Tone / Timbre
- Deep / low-pitched — gravitas, authority
- Smooth / rich — polished, professional
- Gravelly / raspy — character, authenticity
- Airy / breathy — intimate, soft
- Warm / mellow — approachable, friendly
Gender
Male, female, or describe the sound directly: “a lower-pitched, husky female voice” or “a neutral, mid-pitched androgynous voice.”Pacing
- Measured / deliberate — careful, authoritative
- Rapid-fire / quick — energetic, urgent
- Relaxed / conversational — natural, approachable
- Rhythmic — storytelling, narration
Emotion / Energy
- Calm / serene — support, meditation
- Enthusiastic / upbeat — marketing, announcements
- Authoritative / matter-of-fact — IVR, instructions
- Warm / empathetic — customer service, healthcare
Accent / Regional
Describe the regional quality you want. Be specific:- “Slight British accent” rather than “British”
- “Neutral American” rather than just “American”
- “Soft Southern drawl” rather than “Southern”
Use case context
Adding context helps the model understand intent:- “Customer service agent for a bank”
- “Podcast narrator for true crime”
- “Bedtime story reader for children”
Example prompts
| Use Case | Prompt | Recommended Engine |
|---|---|---|
| Customer service | Female, mid-thirties. Warm and full, slightly husky. Moderate pace, sounds like someone who smiles while talking. | Minimax |
| IVR system | Male, late thirties. Clean and dry, matter-of-fact. Deliberate pace, pauses before numbers and details. | Telnyx |
| Voice agent | Female, late twenties. Clear and professional, slightly upbeat. Natural conversational pace with a helpful tone. | |
| Podcast narrator | Male, early forties. Deep and smooth, with a rich baritone. Measured pacing, storytelling cadence. | Minimax |
| Empathetic support | Male, mid-thirties. Warm, slightly gravelly. Measured and unhurried. You can hear patience in the breathing rhythm. | Telnyx |
| Notification/alert | Female, mid-twenties. Bright and crisp. Quick pace, clear enunciation. No emotion — just information. | Minimax |
| Meditation guide | Female, mid-forties. Soft, airy, and serene. Extremely slow and deliberate pace. Soothing and deeply relaxing delivery. | Minimax |
| Energetic promo | Male, early twenties. Bright and enthusiastic, high energy. Rapid-fire pacing, sounds highly engaged and convincing. | Minimax |
| Audiobook (Fiction) | Male, in his 60s. Deep, weathered texture. Relaxed, storytelling cadence with a warm, nostalgic feel. | Telnyx |
Common pitfalls
- Too vague — “nice voice” or “good voice” produces generic output. Be specific about at least 3 dimensions.
- Contradictory traits — “whisper” + “booming” confuses the model. Pick a coherent set of characteristics.
- Provider differences — the same prompt may produce noticeably different results on Telnyx vs Minimax. Try both.
- Ignoring the preview text — the text you provide for synthesis should match the voice’s intended use. Don’t use a cheerful script for a somber voice.
The Enhance button
The portal’s Enhance button uses AI to expand a short description into a detailed prompt.| Before (Short description) | After (Enhanced prompt) |
|---|---|
| Empathetic tech support agent | Empathetic tech support agent Gender and age: Female, late 20s to early 30s. Where the voice sits: Head and chest, with a balanced resonance. Texture: Silky smooth with a faint warmth, slightly airy. Pace: Moderate, with deliberate pauses for clarity and reassurance. Distinctive quality: A gentle, patient lilt that conveys calm and understanding. |
| Persuasive outbound sales caller | Persuasive outbound sales caller Gender and age: Female, 28-34. Where the voice sits: Throat with chest undertones. Texture: Smooth and polished, like warm honey over gravel. Pace: Brisk and rhythmic, with confident pauses for emphasis. Distinctive quality: A bright, engaging lilt that conveys enthusiasm without sounding forced. |
| Professional medical clinic receptionist | Professional medical clinic receptionist Gender and age: Female, 28-34. Where the voice sits: Chest and throat, grounded and clear. Texture: Smooth, slightly warm, with a subtle firmness like pressed cotton. Pace: Measured and steady, with deliberate pauses for clarity. Distinctive quality: A calm, reassuring tone, as if accustomed to offering comfort in stressful moments. |
| Patient language tutor | Patient language tutor Gender and age: Female, late 20s to early 30s. Where the voice sits: Head and chest, balanced resonance. Texture: Smooth, warm, and gently textured like soft velvet. Pace: Measured and deliberate, with thoughtful pauses and clear enunciation. Distinctive quality: A calm, encouraging lilt that feels reassuring and attentive. |