Text to Speech (TTS)
Convert text to natural-sounding speech
POST
/api/v1/tts/generate/
Converts text to speech using AI voice synthesis. Returns an MP3 audio file.
Authentication
Include your API key in the Authorization header.
Request Body
| Parameter | Required | Type | Description |
|---|---|---|---|
| text | required | string | Text to convert (max 1500 characters) |
| output_format | optional | string |
Desired audio output format. Specified as codec_sampleRate_bitrate (e.g., mp3_22050_32).
Default is mp3_44100_128. See Output Formats section below
for the full list.
|
| voice_id | required | integer | Voice ID from the voice list |
| prerecording | optional | integer | 1 (Basic Enhancement) or 2 (AI Enhancement) |
| voice_settings | optional | object | Voice customization options |
Supported Output Formats
The output_format parameter can be any of the following values:
- MP3: mp3_22050_32, mp3_24000_48, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192
- PCM: pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_32000, pcm_44100, pcm_48000
- Opus: opus_48000_32, opus_48000_64, opus_48000_96, opus_48000_128, opus_48000_192
This section serves as a reference, so users can select the appropriate format without cluttering the main request table.
Voice Settings Object
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
| temperature | integer | 0 – 100 | 25 | Controls speech variability and expressiveness. Lower values produce more stable and predictable speech, while higher values increase variation, spontaneity, and creative phrasing. |
| voice_similarity | integer | 0 – 100 | 100 | Determines how closely the generated speech matches the selected voice profile. Higher values preserve the original voice identity, while lower values allow more deviation in tone and articulation. |
| speed | float | 0.7 – 1.2 | 1.0 | Adjusts the speaking rate. Values below 1.0 slow down speech for clarity, while values above 1.0 increase speaking speed. |
| enable_emotions | boolean | false / true | false | Enables emotionally expressive speech generation. When enabled, the model dynamically controls pacing, intonation, and emphasis based on emotional context. |
| fast_mode | integer | 0, 1, or 2 | 0 |
Reduces synthesis latency at the cost of audio quality.
|
Voice Settings Rules & Constraints
-
When
enable_emotionsis set totrue, the following fields must not be included invoice_settings:voice_similarityspeedfast_mode
-
Requests that include any of these fields while emotions are enabled
will be rejected with a
400 INVALID_REQUESTerror.
Example
curl -X POST "https://moknah.io/api/v1/tts/generate/" \
-H "Authorization: "Bearer your_api_key"
-d '{
"text": "مرحبا بك في منصة مكنة",
"voice_id": 1,
"prerecording": 1,
"voice_settings": {
"speed": 1.0,
"temperature": 25
}
}' \
--output speech.mp3
import requests
response = requests.post(
"https://moknah.io/api/v1/tts/generate/",
headers={
"Authorization": "Bearer your_api_key",
"Content-Type": "application/json"
},
json={
"text": "مرحبا بك في منصة مكنة",
"voice_id": 1,
"prerecording": 1,
"voice_settings": {
"speed": 1.0,
"temperature": 25
}
}
)
with open("speech.mp3", "wb") as f:
f.write(response.content)
const response = await fetch(
'https://moknah.io/api/v1/tts/generate/',
{
method: 'POST',
headers: {
'Authorization': 'Bearer your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'مرحبا بك في منصة مكنة',
voice_id: 1,
prerecording: 1,
voice_settings: {
speed: 1.0,
temperature: 70
}
})
}
);
const blob = await response.blob();
// Save or play the audio
Response
Returns binary MP3 audio data with these headers:
| Header | Description |
|---|---|
Content-Type |
audio/mpeg |
Errors
| Status | Code | Description |
|---|---|---|
400 |
INVALID_REQUEST | Invalid parameters |
401 |
UNAUTHORIZED | Invalid API key |
402 |
INSUFFICIENT_CREDITS | Not enough credits |
429 |
RATE_LIMIT_EXCEEDED | Too many requests |
API Support
For API-related questions or issues, contact us at api@moknah.io.