Cookbook

RAG quickstart

Upload a PDF, ask questions, get cited answers — end-to-end in 10 minutes.

Last updated: 2026-05-19

RAG quickstart

We'll build a tiny script that creates a Knowledge Base, uploads a PDF, and asks questions. End-to-end, ten minutes, no fluff.

Prerequisites

A siati.ai account (sign-up)
Get a JWT from the login endpoint or use the dashboard's "Show token" button

JWT=$(curl -s https://my.siati.ai/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "..."}' | jq -r .token)

Step 1 — create the KB

KB=$(curl -s -X POST https://my.siati.ai/api/v1/rag/kb \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{"name": "Quickstart KB"}')

SLUG=$(echo "$KB" | jq -r .slug)
echo "Created KB: $SLUG"

Step 2 — upload a PDF

curl -X POST https://my.siati.ai/api/v1/rag/kb/$SLUG/docs \
  -H "Authorization: Bearer $JWT" \
  -F file=@./your-document.pdf

Returns 202 immediately. Indexing runs async. Wait for status ready (poll):

while true; do
  STATUS=$(curl -s https://my.siati.ai/api/v1/rag/kb/$SLUG/docs \
    -H "Authorization: Bearer $JWT" | jq -r '.documents[0].status')
  echo "[$(date +%T)] $STATUS"
  [ "$STATUS" = "ready" ] && break
  [ "$STATUS" = "failed" ] && { echo "ingestion failed"; exit 1; }
  sleep 3
done

A 50-page PDF typically takes 10–30 seconds.

Step 3 — ask questions

curl -X POST https://my.siati.ai/api/v1/rag/kb/$SLUG/chat \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Cosa dice il documento sull articolo X?",
    "model":    "apertus-70b-instruct",
    "tier":     "medium",
    "top_k":    5
  }' | jq

Response includes the answer, the LLM's token usage, and the source chunks that were used (with their score).

Full Python script

import os, time, requests

BASE = "https://my.siati.ai/api/v1"
JWT  = os.environ["SIATI_JWT"]
H    = {"Authorization": f"Bearer {JWT}"}

# 1. Create KB
kb = requests.post(f"{BASE}/rag/kb", json={"name": "Quickstart KB"}, headers=H).json()
slug = kb["slug"]
print(f"KB created: {slug}")

# 2. Upload PDF
with open("your-document.pdf", "rb") as f:
    requests.post(
        f"{BASE}/rag/kb/{slug}/docs",
        files={"file": f},
        headers=H,
    )

# 3. Wait for ingestion
while True:
    docs = requests.get(f"{BASE}/rag/kb/{slug}/docs", headers=H).json()["documents"]
    status = docs[0]["status"]
    print(f"  status: {status}")
    if status == "ready": break
    if status == "failed": raise RuntimeError(docs[0]["error"])
    time.sleep(3)

# 4. Chat
resp = requests.post(
    f"{BASE}/rag/kb/{slug}/chat",
    json={
        "question": "Qual è il punto centrale del documento?",
        "model":    "apertus-70b-instruct",
        "tier":     "medium",
    },
    headers=H,
).json()

print("\n--- ANSWER ---")
print(resp["answer"])
print("\n--- SOURCES ---")
for s in resp["sources"]:
    print(f"  {s['document_filename']} chunk {s['chunk_idx']} (score {s['score']:.3f})")

Common pitfalls

Cost estimate

For a typical user query on Apertus 70B at medium tier:

~1500 prompt tokens (system + 5 chunks of ~250 tokens each + question)
~150 completion tokens

Cost: (1500/2 + 150) / 1_000_000 × 2.00 CHF/M = ~CHF 0.0018 per query.

Ingestion is billed for embedding tokens only — ~CHF 0.001 per 1000 chunks of 256 tokens each.

RAG quickstart

Prerequisites#

Step 1 — create the KB#

Step 2 — upload a PDF#

Step 3 — ask questions#

Full Python script#

Common pitfalls#

Cost estimate#

Prerequisites

Step 1 — create the KB

Step 2 — upload a PDF

Step 3 — ask questions

Full Python script

Common pitfalls

Cost estimate