> For the complete documentation index, see [llms.txt](https://docs.dos.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.dos.ai/support/rate-limits.md).

# Rate Limits

DOS AI enforces rate limits to ensure fair usage and platform stability for all users.

## Default Limits

| Metric              | Limit                       |
| ------------------- | --------------------------- |
| Requests per minute | 60                          |
| Window type         | Sliding window (60 seconds) |

The rate limiter uses a **sliding window** algorithm. This means the limit is calculated over a rolling 60-second period, not fixed calendar minutes.

## Rate Limit Headers

Every API response includes headers that report your current rate limit status:

| Header                  | Description                                    |
| ----------------------- | ---------------------------------------------- |
| `X-RateLimit-Limit`     | Maximum requests allowed in the current window |
| `X-RateLimit-Remaining` | Requests remaining in the current window       |

### Example Response Headers

```
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
Content-Type: application/json
```

Use these headers to proactively manage your request rate and avoid hitting the limit.

## What Happens When You Hit the Limit

When you exceed the rate limit, the API returns a `429 Too Many Requests` response:

```json
{
  "error": {
    "message": "Rate limit exceeded. Please wait before making another request.",
    "type": "rate_limit_error",
    "code": 429
  }
}
```

The request is **not processed** and **no credits are charged**.

## Handling 429 Errors

### Exponential Backoff

The recommended strategy is **exponential backoff with jitter**. This progressively increases wait time between retries and adds randomness to prevent thundering herd problems.

#### Python Example

```python
import time
import random
import requests

def call_with_backoff(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            base_delay = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            jitter = random.uniform(0, base_delay * 0.5)
            wait_time = base_delay + jitter

            print(f"Rate limited. Retrying in {wait_time:.1f}s...")
            time.sleep(wait_time)
            continue

        response.raise_for_status()

    raise Exception("Max retries exceeded")
```

#### Node.js Example

```javascript
async function callWithBackoff(url, options, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.ok) {
      return await response.json();
    }

    if (response.status === 429) {
      const baseDelay = Math.pow(2, attempt) * 1000;
      const jitter = Math.random() * baseDelay * 0.5;
      const waitTime = baseDelay + jitter;

      console.log(`Rate limited. Retrying in ${(waitTime / 1000).toFixed(1)}s...`);
      await new Promise((resolve) => setTimeout(resolve, waitTime));
      continue;
    }

    throw new Error(`API error: ${response.status}`);
  }

  throw new Error("Max retries exceeded");
}
```

### Proactive Rate Management

Use the rate limit headers to throttle requests before hitting the limit:

```python
import time
import requests

def call_with_throttle(url, headers, data):
    response = requests.post(url, headers=headers, json=data)

    remaining = int(response.headers.get("X-RateLimit-Remaining", 60))

    if remaining < 5:
        time.sleep(2)
    elif remaining < 10:
        time.sleep(0.5)

    return response.json()
```

## Tips for Staying Within Limits

1. **Batch your work** -- Pace requests evenly rather than sending them all at once.
2. **Use streaming** -- Streaming responses (`stream: true`) count as a single request regardless of response length.
3. **Cache responses** -- Avoid making the same API call repeatedly.
4. **Monitor usage** -- Check `X-RateLimit-Remaining` headers and slow down as you approach the limit.
5. **Use fewer, larger requests** -- One comprehensive prompt is better than multiple small ones.

## Enterprise Custom Limits

For organizations with high-volume needs, we offer custom limits:

* **Higher request-per-minute caps** based on your workload
* **Burst allowances** for predictable traffic spikes
* **Dedicated capacity** with guaranteed throughput

Contact **<support@dos.ai>** to discuss custom rate limits for your organization.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dos.ai/support/rate-limits.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
