Rate Limits

Understanding API rate limits and how to work within them

Overview

Rate limits protect the API from abuse and ensure fair usage for all users. Moknah enforces limits on both request frequency and credit consumption.

No Concurrency

Moknah processes one request per user at a time. Wait for the current request to complete before sending another. Concurrent requests will receive a 409 Conflict error.

Default Limits

Limit Type	Value	Description
Requests per Minute (RPM)	`60`	Maximum number of API requests per minute
Credits per Minute (CPM)	`10,000`	Maximum credits consumed per minute (TTS)
Concurrency	`1`	One request processed at a time per user

Rate Limit Headers

Every API response includes headers to help you track your rate limit status:

Header	Description	Example
`RateLimit-Limit`	Your RPM limit	`60`
`RateLimit-Remaining`	Requests remaining this minute	`45`
`RateLimit-Reset`	Unix timestamp when limits reset	`1699574460`
`Moknah-Credits-Remaining`	Credits remain this minute	`10000`
`Moknah-Credits-Used`	Credits used this minute	`5000`

When Rate Limited

If you exceed the rate limit, you'll receive a 429 Too Many Requests response with these additional headers:

Header	Description	Example
`Retry-After`	Seconds to wait before retrying	`45`
`RateLimit-Reset`	Unix timestamp when you can retry	`1699574460`

Credit Calculation

Credits are calculated based on text length and processing options:

Factor	Multiplier	Description
Base cost	`1x`	1 credit per character
AI-Enhanced Normalization	`2x`	Advanced Arabic processing with diacritics
Premium Voice	`+%`	Additional percentage for premium voices

Example: A 500-character text with AI-Enhanced normalization:

500 characters × 2 (AI-Enhanced) = 1,000 credits

Best Practices

1. Implement Request Queuing

Since Moknah doesn't support concurrent requests, queue your requests and process them sequentially:

import queue
import threading
import time

class TTSQueue:
    def __init__(self, api_key):
        self.api_key = api_key
        self.queue = queue.Queue()
        self.worker = threading.Thread(target=self._process_queue, daemon=True)
        self.worker.start()
    
    def _process_queue(self):
        while True:
            text, voice_id, callback = self.queue.get()
            try:
                result = self._generate(text, voice_id)
                callback(result, None)
            except Exception as e:
                callback(None, e)
            finally:
                self.queue.task_done()
    
    def _generate(self, text, voice_id):
        # Your API call here
        pass
    
    def add(self, text, voice_id, callback):
        self.queue.put((text, voice_id, callback))

# Usage
tts = TTSQueue("your_api_key")
tts.add("Hello world", "voice_123", lambda r, e: print(r or e))

class TTSQueue {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.queue = [];
    this.processing = false;
  }

  async add(text, voiceId) {
    return new Promise((resolve, reject) => {
      this.queue.push({ text, voiceId, resolve, reject });
      this.processNext();
    });
  }

  async processNext() {
    if (this.processing || this.queue.length === 0) return;
    
    this.processing = true;
    const { text, voiceId, resolve, reject } = this.queue.shift();
    
    try {
      const result = await this.generate(text, voiceId);
      resolve(result);
    } catch (error) {
      reject(error);
    } finally {
      this.processing = false;
      this.processNext(); // Process next in queue
    }
  }

  async generate(text, voiceId) {
    // Your API call here
  }
}

// Usage
const tts = new TTSQueue("your_api_key");
const audio = await tts.add("Hello world", "voice_123");

2. Monitor Rate Limit Headers

Check the response headers and slow down before hitting the limit:

import requests
import time

def generate_with_rate_limit(text, voice_id, api_key):
    response = requests.post(
        "https://moknah.io/api/v1/tts/generate",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"text": text, "voice_id": voice_id}
    )
    
    # Check remaining requests
    remaining_rpm = int(response.headers.get("RateLimit-Remaining", 60))
    remaining_cpm = int(response.headers.get("Moknah-Credits-Remaining", 10000))
    
    # Slow down if running low
    if remaining_rpm < 10:
        print(f"Warning: Only {remaining_rpm} requests remaining this minute")
        time.sleep(1)  # Add delay between requests
    
    if remaining_cpm < 1000:
        print(f"Warning: Only {remaining_cpm} credits remaining this minute")
    
    return response

async function generateWithRateLimit(text, voiceId, apiKey) {
  const response = await fetch("https://moknah.io/api/v1/tts/generate", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${apiKey}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({ text, voice_id: voiceId })
  });
  
  // Check remaining requests
  const remainingRPM = parseInt(response.headers.get("RateLimit-Remaining") || 60);
  const remainingCPM = parseInt(response.headers.get("Moknah-Credits-Remaining") || 10000);
  
  // Slow down if running low
  if (remainingRPM < 10) {
    console.warn(`Warning: Only ${remainingRPM} requests remaining this minute`);
    await new Promise(r => setTimeout(r, 1000)); // Add delay
  }
  
  if (remainingCPM < 1000) {
    console.warn(`Warning: Only ${remainingCPM} credits remaining this minute`);
  }
  
  return response;
}

3. Implement Exponential Backoff

When rate limited, use exponential backoff to retry:

import time
import random

def request_with_backoff(func, max_retries=5):
    retries = 0
    
    while retries < max_retries:
        try:
            response = func()
            
            if response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                # Add jitter to prevent thundering herd
                wait_time = retry_after + random.uniform(0, 1)
                print(f"Rate limited.Waiting {wait_time:.1f}s...")
                time.sleep(wait_time)
                retries += 1
                continue
            
            return response
            
        except Exception as e:
            # Exponential backoff for other errors
            wait_time = (2 ** retries) + random.uniform(0, 1)
            print(f"Error: {e}.Retrying in {wait_time:.1f}s...")
            time.sleep(wait_time)
            retries += 1
    
    raise Exception("Max retries exceeded")

async function requestWithBackoff(func, maxRetries = 5) {
  let retries = 0;
  
  while (retries < maxRetries) {
    try {
      const response = await func();
      
      if (response.status === 429) {
        const retryAfter = parseInt(response.headers.get("Retry-After") || 60);
        // Add jitter to prevent thundering herd
        const waitTime = retryAfter + Math.random();
        console.log(`Rate limited.Waiting ${waitTime.toFixed(1)}s...`);
        await new Promise(r => setTimeout(r, waitTime * 1000));
        retries++;
        continue;
      }
      
      return response;
      
    } catch (error) {
      // Exponential backoff for other errors
      const waitTime = Math.pow(2, retries) + Math.random();
      console.log(`Error: ${error.message}.Retrying in ${waitTime.toFixed(1)}s...`);
      await new Promise(r => setTimeout(r, waitTime * 1000));
      retries++;
    }
  }
  
  throw new Error("Max retries exceeded");
}

Summary

Quick Reference

Default limits: 60 RPM, 10,000 CPM, no concurrency
Need more? Email sales@moknah.io

API Support

For API-related questions or issues, contact us at api@moknah.io.