Chat Completions

The Chat Completions API is the primary way to interact with DOS AI models. It follows the OpenAI-compatible format, so you can use existing OpenAI SDKs and tools with minimal changes.

Base URL

https://api.dos.ai/v1

Authentication

All requests require an API key passed in the Authorization header:

Authorization: Bearer dos_sk_your_api_key_here

Basic Request

A chat completion request consists of a list of messages and a model identifier. The model generates a response based on the conversation history.

Request Format

POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer dos_sk_your_api_key_here

{
  "model": "dos-ai",
  "messages": [
    { "role": "user", "content": "What is the capital of France?" }
  ]
}

Response Format

Message Roles

Each message in the messages array has a role and content. The API supports four roles:

Role
Description

system

Sets the behavior and personality of the assistant. Placed at the beginning of the conversation.

user

Messages from the end user.

assistant

Previous responses from the model. Used for multi-turn context.

tool

Results from tool/function calls. See Function Calling.

System Message

Use the system message to instruct the model on how to behave:

Parameters

Parameter
Type
Default
Description

model

string

required

Model ID to use (e.g., dos-ai).

messages

array

required

List of messages in the conversation.

temperature

float

0.7

Sampling temperature between 0 and 2. Lower values make output more deterministic.

max_tokens

integer

model default

Maximum number of tokens to generate in the response.

top_p

float

1.0

Nucleus sampling threshold. Only tokens with cumulative probability up to top_p are considered.

frequency_penalty

float

0.0

Penalizes tokens based on how frequently they appear (range: -2.0 to 2.0).

presence_penalty

float

0.0

Penalizes tokens based on whether they have appeared at all (range: -2.0 to 2.0).

stop

string or array

null

Up to 4 sequences where the model will stop generating.

stream

boolean

false

If true, returns a stream of server-sent events. See Streaming.

n

integer

1

Number of completions to generate for each prompt.

response_format

object

null

Force a specific output format. See Structured Outputs.

tools

array

null

List of tools the model may call. See Function Calling.

Temperature vs Top-p

  • Temperature controls randomness. 0 is nearly deterministic, 2 is highly random.

  • Top-p controls diversity by limiting the token pool. 0.1 means only the top 10% probability mass is considered.

It is generally recommended to adjust one or the other, not both simultaneously.

Multi-turn Conversations

To maintain context across multiple exchanges, include previous messages in the request. The model does not retain state between requests -- you must send the full conversation history each time.

The model sees the full conversation and can respond contextually: "The derivative of x^3 is 3x^2."

Code Examples

cURL

Python (OpenAI SDK)

The easiest way to use DOS AI in Python is with the official OpenAI SDK, pointed at the DOS AI base URL:

Python (requests)

If you prefer not to use the OpenAI SDK:

JavaScript (Node.js)

Using the OpenAI Node.js SDK:

JavaScript (fetch)

Using the native fetch API:

Error Handling

The API returns standard HTTP status codes and a JSON error body:

Status Code
Meaning

400

Bad request -- malformed JSON or invalid parameters.

401

Unauthorized -- missing or invalid API key.

402

Insufficient credits -- top up your balance.

429

Rate limit exceeded -- slow down and retry after the indicated period.

500

Internal server error -- retry with exponential backoff.

503

Service unavailable -- the model is temporarily overloaded.

Error response format:

Retry Strategy

For 429 and 5xx errors, implement exponential backoff:

Available Models

Model ID
Description

dos-ai

Qwen3.5-35B-A3B -- fast, efficient, recommended for most tasks.

Check the Models endpoint for the current list of available models.

Next Steps

Last updated