siati.ai docs

API reference

Chat sessions (mobile / app contract)

Persistent conversations with auto-title, SSE streaming, and stop-on-demand. The contract used by the iOS/Android apps.

Last updated: 2026-05-24

Chat sessions API

The developer API at api.siati.ai (Bearer key, stateless /v1/chat/completions) is the OpenAI-compatible entry point. For mobile apps and persistent chat UIs, we expose a second, session-oriented API at my.siati.ai/api/v1/chat/sessions/* with:

  • Stateful conversations stored server-side (no need to ship history every request)
  • Server-Sent Events streaming with chunk-level delivery
  • Auto-titled conversations from the first user message
  • Per-message metrics (TTFT, latency, tok/s, prompt/completion tokens)
  • Stop-generation endpoint for interrupting long responses

Authentication is JWT Bearer (issued at app login), not API key.

Base URL & auth

text
Base URL:  https://my.siati.ai/api/v1
Auth:      Authorization: Bearer <jwt>

JWT is obtained from POST /api/v1/auth/login with email/password (returns 30-day JWT). The token is stateless and the app keeps it in secure storage (Keychain on iOS, KeyStore on Android).

Endpoints

List conversations

http
GET /chat/sessions

Returns the user's last 200 non-archived conversations (no messages, just metadata).

json
[
  {
    "id": "019e4c12-3a4b-...",
    "title": "Teorema di Pitagora",
    "model": "apertus-70b-instruct",
    "created_at": "2026-05-24T08:12:00+00:00",
    "updated_at": "2026-05-24T08:14:32+00:00",
    "messages": []
  }
]

Get conversation with messages

http
GET /chat/sessions/{id}

Returns the full conversation including all messages.

json
{
  "id": "019e4c12-3a4b-...",
  "title": "Teorema di Pitagora",
  "model": "apertus-70b-instruct",
  "created_at": "...",
  "updated_at": "...",
  "messages": [
    {
      "id": "...",
      "role": "user",
      "content": "Spiegami il teorema di Pitagora",
      "created_at": "...",
      "prompt_tokens": 0,
      "completion_tokens": 0,
      "latency_ms": 0
    },
    {
      "id": "...",
      "role": "assistant",
      "content": "Il teorema di Pitagora afferma…",
      "created_at": "...",
      "prompt_tokens": 24,
      "completion_tokens": 187,
      "latency_ms": 0
    }
  ]
}

Create conversation

http
POST /chat/sessions
Content-Type: application/json

{
  "title": "My new chat",       // optional, defaults to "Nuova conversazione"
  "model": "apertus-70b-instruct" // optional, defaults to qwen2.5:1.5b
}

Returns the new conversation (201 Created). Use the returned id for subsequent messages.

Rename / archive

http
PATCH /chat/sessions/{id}
Content-Type: application/json

{
  "title": "Renamed conversation"
}
http
DELETE /chat/sessions/{id}

DELETE is a hard delete — conversation and all messages are removed (FK cascade). Use the dashboard "Archivia" if you want soft-archive instead.

Send a message (the main one)

http
POST /chat/sessions/{id}/messages
Content-Type: application/json
Accept: text/event-stream

{
  "content": "Spiegami il teorema di Pitagora",
  "model":   "apertus-70b-instruct",
  "tier":    "fast",            // optional: slow|medium|fast|ludicrous
  "backend": "spark"            // optional: family name (mac-mini, l40-b, spark, bigguy, inference-vm)
}

Response: text/event-stream (SSE). Chunks arrive in order:

text
data: {"type":"delta","text":"Il teorema "}

data: {"type":"delta","text":"di Pitagora "}

data: {"type":"delta","text":"afferma che "}

...

data: {"type":"done","prompt_tokens":24,"completion_tokens":187,"latency_ms":4823,"outcome":"ok","title_job_queued":true}

Event types:

type Fields Meaning
delta text Append text to the assistant message buffer
done prompt_tokens, completion_tokens, latency_ms, outcome, title_job_queued Stream complete. Save the accumulated assistant message.
error code, message Upstream/inference error. outcome will be error on the next done.

About title_job_queued

When the response is for the first user message of a conversation, the server queues a background job (GenerateConversationTitle) that uses Qwen 2.5 7B to write a concise title. The job typically completes in 1-2 seconds after the done event.

The flag title_job_queued: true in the done event tells you to poll GET /chat/sessions/{id} once after ~2 seconds to pick up the new title. After that, the title is stable.

About tier and backend

Both optional. If omitted, defaults are:

  • tier: inherited from previous turn (or slow for new conv)
  • backend: router's choice — picks the least-loaded family that serves the model

When you pass backend, you scope the routing to a hardware family (e.g. mac-mini = all Mac mini Ollama backends; the router still load-balances within that family).

Stop generation

http
POST /stop/{id}

Same domain (chat.siati.ai/stop/...) as the chat webapp — purposely outside of the main API surface so it can be called as a fire-and-forget from a separate connection while a long-running messages POST is still streaming.

Server writes a Redis flag chat:stop:{id} that the streaming loop polls on every chunk. The model finishes the current chunk then breaks, and the assistant message is saved with whatever was generated up to that point, suffixed with \n\n_⏹ Generazione interrotta dall'utente._.

Full client example (TypeScript)

typescript
const BASE = 'https://my.siati.ai/api/v1';
const headers = {
  'Authorization': `Bearer ${jwt}`,
  'Content-Type':  'application/json',
  'Accept':        'application/json',
};

// 1) Create a session
const conv = await fetch(`${BASE}/chat/sessions`, {
  method: 'POST', headers,
  body: JSON.stringify({ model: 'apertus-70b-instruct' }),
}).then(r => r.json());

// 2) Send a message and consume the SSE stream
const resp = await fetch(`${BASE}/chat/sessions/${conv.id}/messages`, {
  method: 'POST',
  headers: { ...headers, 'Accept': 'text/event-stream' },
  body: JSON.stringify({
    content: 'Spiegami il teorema di Pitagora con un esempio.',
    model:   'apertus-70b-instruct',
    tier:    'fast',
  }),
});

const reader = resp.body!.getReader();
const decoder = new TextDecoder();
let buf = '';
let assistantText = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });

  // SSE frames: 'data: {…}\n\n'
  let frameEnd;
  while ((frameEnd = buf.indexOf('\n\n')) !== -1) {
    const frame = buf.slice(0, frameEnd);
    buf = buf.slice(frameEnd + 2);
    if (!frame.startsWith('data: ')) continue;
    const ev = JSON.parse(frame.slice(6));

    if (ev.type === 'delta') {
      assistantText += ev.text;
      updateUI(assistantText);
    }
    if (ev.type === 'done') {
      saveMessage(conv.id, assistantText, ev);
      if (ev.title_job_queued) {
        // Poll once after ~2s to pick up the auto-title
        setTimeout(async () => {
          const fresh = await fetch(`${BASE}/chat/sessions/${conv.id}`, { headers }).then(r => r.json());
          updateTitle(fresh.title);
        }, 2000);
      }
    }
    if (ev.type === 'error') {
      showError(ev.message);
    }
  }
}

Errors

HTTP When
401 JWT missing/expired/invalid
404 Conversation not found or not owned by you
422 Validation failed (e.g. content empty or too long)
429 Rate limit hit for your plan/tier (see Rate limits)
503 No backend healthy for the requested (model, tier)

Errors during SSE streaming are emitted as data: {"type":"error", …} followed by data: {"type":"done", …, "outcome":"error"}. The HTTP status is still 200 (the response started successfully).

Differences vs developer API

Developer API (api.siati.ai) Sessions API (my.siati.ai/api/v1)
Auth Bearer API key Bearer JWT
State Stateless Server-side conversations
Format OpenAI-compatible Custom, simplified
Streaming OpenAI SSE chunks {type:delta,text} / {type:done,…}
Auto-title No Yes, via background job
Stop-on-demand No (close connection) POST /stop/{id}
History management You manage Server manages
Use case Backend integration, OpenAI SDK migration Mobile apps, web chat UIs

For a chat app, use this sessions API. For backend code that already speaks OpenAI, use the developer API.