User Guide API Reference

Rate Limits

Understanding API rate limits and how to work within them

Overview

Rate limits protect the API from abuse and ensure fair usage for all users. Moknah enforces limits on both request frequency and credit consumption.

No Concurrency

Moknah processes one request per user at a time. Wait for the current request to complete before sending another. Concurrent requests will receive a 409 Conflict error.

Default Limits

Limit Type Value Description
Requests per Minute (RPM) 60 Maximum number of API requests per minute
Credits per Minute (CPM) 10,000 Maximum credits consumed per minute (TTS)
Concurrency 1 One request processed at a time per user

Rate Limit Headers

Every API response includes headers to help you track your rate limit status:

Header Description Example
RateLimit-Limit Your RPM limit 60
RateLimit-Remaining Requests remaining this minute 45
RateLimit-Reset Unix timestamp when limits reset 1699574460
Moknah-Credits-Remaining Credits remain this minute 10000
Moknah-Credits-Used Credits used this minute 5000

When Rate Limited

If you exceed the rate limit, you'll receive a 429 Too Many Requests response with these additional headers:

Header Description Example
Retry-After Seconds to wait before retrying 45
RateLimit-Reset Unix timestamp when you can retry 1699574460

Credit Calculation

Credits are calculated based on text length and processing options:

Factor Multiplier Description
Base cost 1x 1 credit per character
AI-Enhanced Normalization 2x Advanced Arabic processing with diacritics
Premium Voice +% Additional percentage for premium voices

Example: A 500-character text with AI-Enhanced normalization:

500 characters × 2 (AI-Enhanced) = 1,000 credits

Best Practices

1. Implement Request Queuing

Since Moknah doesn't support concurrent requests, queue your requests and process them sequentially:

import queue
import threading
import time

class TTSQueue:
    def __init__(self, api_key):
        self.api_key = api_key
        self.queue = queue.Queue()
        self.worker = threading.Thread(target=self._process_queue, daemon=True)
        self.worker.start()
    
    def _process_queue(self):
        while True:
            text, voice_id, callback = self.queue.get()
            try:
                result = self._generate(text, voice_id)
                callback(result, None)
            except Exception as e:
                callback(None, e)
            finally:
                self.queue.task_done()
    
    def _generate(self, text, voice_id):
        # Your API call here
        pass
    
    def add(self, text, voice_id, callback):
        self.queue.put((text, voice_id, callback))

# Usage
tts = TTSQueue("your_api_key")
tts.add("Hello world", "voice_123", lambda r, e: print(r or e))
class TTSQueue {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.queue = [];
    this.processing = false;
  }

  async add(text, voiceId) {
    return new Promise((resolve, reject) => {
      this.queue.push({ text, voiceId, resolve, reject });
      this.processNext();
    });
  }

  async processNext() {
    if (this.processing || this.queue.length === 0) return;
    
    this.processing = true;
    const { text, voiceId, resolve, reject } = this.queue.shift();
    
    try {
      const result = await this.generate(text, voiceId);
      resolve(result);
    } catch (error) {
      reject(error);
    } finally {
      this.processing = false;
      this.processNext(); // Process next in queue
    }
  }

  async generate(text, voiceId) {
    // Your API call here
  }
}

// Usage
const tts = new TTSQueue("your_api_key");
const audio = await tts.add("Hello world", "voice_123");

2. Monitor Rate Limit Headers

Check the response headers and slow down before hitting the limit:

import requests
import time

def generate_with_rate_limit(text, voice_id, api_key):
    response = requests.post(
        "https://moknah.io/api/v1/tts/generate",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"text": text, "voice_id": voice_id}
    )
    
    # Check remaining requests
    remaining_rpm = int(response.headers.get("RateLimit-Remaining", 60))
    remaining_cpm = int(response.headers.get("Moknah-Credits-Remaining", 10000))
    
    # Slow down if running low
    if remaining_rpm < 10:
        print(f"Warning: Only {remaining_rpm} requests remaining this minute")
        time.sleep(1)  # Add delay between requests
    
    if remaining_cpm < 1000:
        print(f"Warning: Only {remaining_cpm} credits remaining this minute")
    
    return response
async function generateWithRateLimit(text, voiceId, apiKey) {
  const response = await fetch("https://moknah.io/api/v1/tts/generate", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${apiKey}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({ text, voice_id: voiceId })
  });
  
  // Check remaining requests
  const remainingRPM = parseInt(response.headers.get("RateLimit-Remaining") || 60);
  const remainingCPM = parseInt(response.headers.get("Moknah-Credits-Remaining") || 10000);
  
  // Slow down if running low
  if (remainingRPM < 10) {
    console.warn(`Warning: Only ${remainingRPM} requests remaining this minute`);
    await new Promise(r => setTimeout(r, 1000)); // Add delay
  }
  
  if (remainingCPM < 1000) {
    console.warn(`Warning: Only ${remainingCPM} credits remaining this minute`);
  }
  
  return response;
}

3. Implement Exponential Backoff

When rate limited, use exponential backoff to retry:

import time
import random

def request_with_backoff(func, max_retries=5):
    retries = 0
    
    while retries < max_retries:
        try:
            response = func()
            
            if response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                # Add jitter to prevent thundering herd
                wait_time = retry_after + random.uniform(0, 1)
                print(f"Rate limited.Waiting {wait_time:.1f}s...")
                time.sleep(wait_time)
                retries += 1
                continue
            
            return response
            
        except Exception as e:
            # Exponential backoff for other errors
            wait_time = (2 ** retries) + random.uniform(0, 1)
            print(f"Error: {e}.Retrying in {wait_time:.1f}s...")
            time.sleep(wait_time)
            retries += 1
    
    raise Exception("Max retries exceeded")
async function requestWithBackoff(func, maxRetries = 5) {
  let retries = 0;
  
  while (retries < maxRetries) {
    try {
      const response = await func();
      
      if (response.status === 429) {
        const retryAfter = parseInt(response.headers.get("Retry-After") || 60);
        // Add jitter to prevent thundering herd
        const waitTime = retryAfter + Math.random();
        console.log(`Rate limited.Waiting ${waitTime.toFixed(1)}s...`);
        await new Promise(r => setTimeout(r, waitTime * 1000));
        retries++;
        continue;
      }
      
      return response;
      
    } catch (error) {
      // Exponential backoff for other errors
      const waitTime = Math.pow(2, retries) + Math.random();
      console.log(`Error: ${error.message}.Retrying in ${waitTime.toFixed(1)}s...`);
      await new Promise(r => setTimeout(r, waitTime * 1000));
      retries++;
    }
  }
  
  throw new Error("Max retries exceeded");
}

Summary

Quick Reference

Default limits: 60 RPM, 10,000 CPM, no concurrency
Need more? Email sales@moknah.io

API Support

For API-related questions or issues, contact us at api@moknah.io.