Cookbook
RAG quickstart
Upload a PDF, ask questions, get cited answers — end-to-end in 10 minutes.
Last updated: 2026-05-19
RAG quickstart
We'll build a tiny script that creates a Knowledge Base, uploads a PDF, and asks questions. End-to-end, ten minutes, no fluff.
Prerequisites
- A siati.ai account (sign-up)
- Get a JWT from the login endpoint or use the dashboard's "Show token" button
JWT=$(curl -s https://my.siati.ai/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com", "password": "..."}' | jq -r .token)
Step 1 — create the KB
KB=$(curl -s -X POST https://my.siati.ai/api/v1/rag/kb \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{"name": "Quickstart KB"}')
SLUG=$(echo "$KB" | jq -r .slug)
echo "Created KB: $SLUG"
Step 2 — upload a PDF
curl -X POST https://my.siati.ai/api/v1/rag/kb/$SLUG/docs \
-H "Authorization: Bearer $JWT" \
-F file=@./your-document.pdf
Returns 202 immediately. Indexing runs async. Wait for status ready (poll):
while true; do
STATUS=$(curl -s https://my.siati.ai/api/v1/rag/kb/$SLUG/docs \
-H "Authorization: Bearer $JWT" | jq -r '.documents[0].status')
echo "[$(date +%T)] $STATUS"
[ "$STATUS" = "ready" ] && break
[ "$STATUS" = "failed" ] && { echo "ingestion failed"; exit 1; }
sleep 3
done
A 50-page PDF typically takes 10–30 seconds.
Step 3 — ask questions
curl -X POST https://my.siati.ai/api/v1/rag/kb/$SLUG/chat \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"question": "Cosa dice il documento sull articolo X?",
"model": "apertus-70b-instruct",
"tier": "medium",
"top_k": 5
}' | jq
Response includes the answer, the LLM's token usage, and the source chunks that were used (with their score).
Full Python script
import os, time, requests
BASE = "https://my.siati.ai/api/v1"
JWT = os.environ["SIATI_JWT"]
H = {"Authorization": f"Bearer {JWT}"}
# 1. Create KB
kb = requests.post(f"{BASE}/rag/kb", json={"name": "Quickstart KB"}, headers=H).json()
slug = kb["slug"]
print(f"KB created: {slug}")
# 2. Upload PDF
with open("your-document.pdf", "rb") as f:
requests.post(
f"{BASE}/rag/kb/{slug}/docs",
files={"file": f},
headers=H,
)
# 3. Wait for ingestion
while True:
docs = requests.get(f"{BASE}/rag/kb/{slug}/docs", headers=H).json()["documents"]
status = docs[0]["status"]
print(f" status: {status}")
if status == "ready": break
if status == "failed": raise RuntimeError(docs[0]["error"])
time.sleep(3)
# 4. Chat
resp = requests.post(
f"{BASE}/rag/kb/{slug}/chat",
json={
"question": "Qual è il punto centrale del documento?",
"model": "apertus-70b-instruct",
"tier": "medium",
},
headers=H,
).json()
print("\n--- ANSWER ---")
print(resp["answer"])
print("\n--- SOURCES ---")
for s in resp["sources"]:
print(f" {s['document_filename']} chunk {s['chunk_idx']} (score {s['score']:.3f})")
Common pitfalls
Cost estimate
For a typical user query on Apertus 70B at medium tier:
- ~1500 prompt tokens (system + 5 chunks of ~250 tokens each + question)
- ~150 completion tokens
Cost: (1500/2 + 150) / 1_000_000 × 2.00 CHF/M = ~CHF 0.0018 per query.
Ingestion is billed for embedding tokens only — ~CHF 0.001 per 1000 chunks of 256 tokens each.