siati.ai docs

API reference

Streaming (SSE)

How to consume token-by-token streaming responses.

Last updated: 2026-05-19

Streaming (SSE)

Add "stream": true to any chat completion request. The response is Server-Sent Events (text/event-stream), OpenAI-compatible.

Wire format

Each event line starts with data: followed by a JSON payload. The stream ends with data: [DONE].

text
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"content":"Ciao"}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":2}}

data: [DONE]

curl

bash
curl -N https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apertus-70b-instruct",
    "stream": true,
    "stream_options": {"include_usage": true},
    "messages": [{"role": "user", "content": "Conta da 1 a 10."}]
  }'

The -N flag disables curl's buffering — essential to see tokens as they arrive.

Python (openai SDK)

python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.siati.ai/v1",
    api_key=os.environ["SIATI_API_KEY"],
)

stream = client.chat.completions.create(
    model="apertus-70b-instruct",
    messages=[{"role": "user", "content": "Conta da 1 a 10."}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

JavaScript / TypeScript (fetch + ReadableStream)

typescript
const res = await fetch("https://api.siati.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SIATI_API_KEY}`,
    "Content-Type":  "application/json",
  },
  body: JSON.stringify({
    model: "apertus-70b-instruct",
    stream: true,
    messages: [{ role: "user", content: "Conta da 1 a 10." }],
  }),
});

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  for (const line of buf.split("\n")) {
    if (!line.startsWith("data: ")) continue;
    const data = line.slice(6).trim();
    if (data === "[DONE]") return;
    const chunk = JSON.parse(data);
    const delta = chunk.choices?.[0]?.delta?.content;
    if (delta) process.stdout.write(delta);
  }
  buf = buf.split("\n").pop() || "";
}

Reconnection

SSE has built-in reconnection support via the Last-Event-ID header in the browser's EventSource. We don't currently resume mid-stream (resumption mid-completion is not part of the OpenAI spec either). On disconnect, the request is closed and any partial work is billed (you only pay for what was actually generated).