API reference

Chat completions

POST /v1/chat/completions — OpenAI-compatible, with extras for tier routing and priority.

Last updated: 2026-05-19

Chat completions

POST https://api.siati.ai/v1/chat/completions

Drop-in compatible with the OpenAI chat completions endpoint. Same request schema, same response schema, plus our additions for sovereignty (tier, priority).

Request

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Siati-Tier: medium" \
  -d '{
    "model": "apertus-70b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful Swiss assistant."},
      {"role": "user",   "content": "Spiegami il principio di sovranità in 3 frasi."}
    ],
    "temperature": 0.7,
    "max_tokens": 400,
    "top_p": 0.9,
    "stream": false
  }'

Parameters

Param	Type	Required	Description
`model`	string	✓	Model ID. See Models catalog.
`messages`	array	✓	List of `{role, content}` objects. Roles: `system`, `user`, `assistant`, `tool`.
`temperature`	float	–	0–2, default 1.0. Lower = more deterministic.
`top_p`	float	–	Nucleus sampling, 0–1.
`max_tokens`	int	–	Cap on completion length.
`stream`	bool	–	If true, returns SSE stream. See Streaming.
`stop`	array	–	Stop sequences.
`presence_penalty` / `frequency_penalty`	float	–	OpenAI-compatible.
`response_format`	object	–	`{ "type": "json_object" }` for structured output.
`tools` / `tool_choice`	array / string	–	Function calling, OpenAI shape.

Headers (siati-specific)

Header	Description
`X-Siati-Tier: slow\|medium\|fast\|ludicrous`	Override the default tier of your key for this request.
`X-Request-Id: <uuid>`	Idempotency key for billing. Retry safe.
`X-Siati-User: <opaque-id>`	Optional end-user identifier for multi-tenant logs.

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1779203245,
  "model": "apertus-70b-instruct",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 91,
    "completion_tokens": 67,
    "total_tokens": 158
  }
}

Models

Pick any model from the live catalog. Highlights:

gemma-4-26b — Google Gemma 4: multimodal (text + images), strong tool calling and robust JSON output. Fast MoE architecture — a great default for agents and structured extraction.
apertus-70b-instruct — large Swiss-aligned model when you need depth.
qwen2.5:32b / qwen2.5:14b / qwen2.5:1.5b — balanced sizes, low latency.

Multimodal request (Gemma 4)

Gemma 4 accepts up to 2 images per request via the standard OpenAI vision format:

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-26b",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Cosa mostra questa immagine? Rispondi in JSON."},
        {"type": "image_url", "image_url": {"url": "https://example.com/foto.jpg"}}
      ]
    }]
  }'

Errors

Standard codes from Errors. The most common:

400 invalid_request_error — bad shape, unknown role, etc.
401 invalid_api_key — see Authentication.
429 rate_limit_exceeded — see Rate limits. Includes Retry-After.

Chat completions

Request#

Parameters#

Headers (siati-specific)#

Response#

Models#

Multimodal request (Gemma 4)#

Errors#

Tips#

Request

Parameters

Headers (siati-specific)

Response

Models

Multimodal request (Gemma 4)

Errors

Tips