User Guide API Reference

FAQ & Troubleshooting

Common questions and solutions

Frequently Asked Questions

Why is the output different every time?

The AI is nondeterministic. Like a human actor, it gives a slightly different performance each time. Use the Temperature slider to control this variability:

  • Lower temperature: More consistent, predictable outputs
  • Higher temperature: More variation and expressiveness

Why does voice selection matter so much?

Voice selection has a tremendous effect on the output. The AI inherits characteristics from the selected voice:

  • If you want an introspective voice, select a voice that sounds introspective
  • If you want an energetic voice, select one cloned from energetic samples
  • If you want a specific dialect, choose a voice trained on that dialect
Remember: Good Input = Good Output

If you provide audio with noise, reverb, or multiple speakers, the AI will be unstable. Always use clean, high-quality source material for best results.

How are credits calculated?

Credits are calculated based on the number of characters in your text and the normalization mode selected:

  • Basic Normalization: x1 credits
  • AI-Enhanced Normalization: x2 credits

What audio format is generated?

All generated audio is delivered in MP3 format at 128 or 192 kbps (44.1 kHz) for optimal quality and compatibility.

Is there a character limit?

Mode Character Limit Approx. Duration
Emotions & Dialects Enabled 3,000 characters ~3 minutes
Standard Mode 10,000 characters ~10 minutes
Text to Audio (Quick) 1,500 characters ~1.5 minutes

For longer content, use the Text to Audio Studio.

Can I use the generated audio commercially?

Yes, all audio generated with your account can be used for commercial purposes according to your subscription terms.


Troubleshooting

Audio sounds robotic

  • Try increasing the Temperature setting for more natural variation
  • Increase the Expressiveness slider slightly
  • Enable Emotions & Dialects Mode for expressive content
  • Try a different voice that matches your content style

Output is inconsistent or unstable

  • Lower the Temperature setting
  • Lower the Expressiveness slider
  • Use longer prompts (250+ characters recommended for Emotions mode)
  • Check that your source audio (for voice cloning) is clean and noise-free

Generation is slow

  • Longer texts take more time to process
  • AI-Enhanced normalization requires additional processing
  • Check your internet connection

Unexpected pronunciation

  • Use AI-Enhanced Normalization for Arabic text (adds diacritics automatically)
  • Add manual diacritics (تشكيل) to ambiguous words
  • Write numbers as words instead of digits
  • Expand abbreviations (e.g., "Dr." → "Doctor")
  • Try a different voice - some handle certain words better

Dialect not working correctly

  • Ensure the selected voice matches the dialect in your text
  • Use Basic Normalization for dialect content (not AI-Enhanced)
  • Start your text with a strong dialect word to "prime" the model
  • Enable Emotions & Dialects Mode

Voice clone doesn't sound like the original

  • Increase the Similarity slider
  • Ensure your training audio was high quality (no noise, echo, or reverb)
  • Check that training audio had consistent tone and energy throughout
  • If source had noise, lower the Similarity to reduce artifacts

Contact Support

Reach out to our support team at support@moknah.io and we'll help you resolve any issues.