API reference
RAG endpoints
Upload documents, ask questions, get answers with citations — managed pipeline.
Last updated: 2026-05-19
RAG endpoints
Managed RAG: we run the vector store (Qdrant), the embedding model (BGE-M3), the LLM (your choice), and the orchestration. You upload documents; you ask questions; you get cited answers.
Base URL: https://my.siati.ai/api/v1/rag/. All endpoints require JWT authentication (see Authentication).
Endpoints
| Method | Path | Purpose |
|---|---|---|
| GET | /kb |
List your knowledge bases |
| POST | /kb |
Create a new knowledge base |
| GET | /kb/{slug}/docs |
List documents in a KB |
| POST | /kb/{slug}/docs |
Upload a document (multipart) |
| POST | /kb/{slug}/chat |
Ask a question against a KB |
Create a knowledge base
curl https://my.siati.ai/api/v1/rag/kb \
-H "Authorization: Bearer $SIATI_JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "Contracts 2026",
"description": "All client contracts signed in 2026"
}'
# → 201 { "id": "...", "slug": "contracts-2026-abc123", "name": "Contracts 2026" }
The slug is auto-generated (URL-safe + a 6-char suffix) and used in all subsequent calls.
Upload a document
curl https://my.siati.ai/api/v1/rag/kb/contracts-2026-abc123/docs \
-H "Authorization: Bearer $SIATI_JWT" \
-F file=@./contract-acme.pdf
# → 202 { "id": "...", "status": "pending", "original_filename": "contract-acme.pdf", "size_bytes": 423812 }
Supported formats: PDF, DOCX, MD, TXT. Max 50 MB per file.
Indexing is async (queue job). Poll GET /docs to see status transitions: pending → parsing → chunking → embedding → ready.
List documents
curl https://my.siati.ai/api/v1/rag/kb/contracts-2026-abc123/docs \
-H "Authorization: Bearer $SIATI_JWT"
# → { "documents": [ {id, original_filename, mime_type, size_bytes, status, error, chunks_count, ingested_at}, ... ] }
Ask a question
curl https://my.siati.ai/api/v1/rag/kb/contracts-2026-abc123/chat \
-H "Authorization: Bearer $SIATI_JWT" \
-H "Content-Type: application/json" \
-d '{
"question": "Qual è il termine di preavviso nel contratto Acme?",
"model": "apertus-70b-instruct",
"tier": "medium",
"top_k": 5
}'
Response:
{
"answer": "Il termine di preavviso nel contratto Acme è di 3 mesi, come specificato all'articolo 12 [contract-acme.pdf: parte 4].",
"sources": [
{
"score": 0.847,
"text": "Articolo 12 — Recesso. Ciascuna parte ha facoltà di recedere dal presente contratto con preavviso di tre (3) mesi...",
"document_filename": "contract-acme.pdf",
"chunk_idx": 4
}
],
"model": "apertus-70b-instruct",
"tier": "medium",
"prompt_tokens": 1141,
"completion_tokens": 97
}
How the model is prompted
The system prompt we inject (you don't see it as a developer but it's worth knowing):
Sei un assistente che risponde basandosi esclusivamente sui documenti
forniti nel CONTESTO qui sotto. Se la risposta non è presente nei
documenti, dichiaralo onestamente. Cita sempre le fonti come
[filename: parte N] dove rilevante. Rispondi nella stessa lingua
della domanda.
=== CONTESTO ===
[#1 filename: contract-acme.pdf — parte 4 — score=0.847]
Articolo 12 — Recesso...
[#2 filename: contract-acme.pdf — parte 5 — score=0.812]
...
=== FINE CONTESTO ===
This guarantees:
- Citations format
- No hallucination beyond context
- Language matching
Override the prompt? Not via the managed RAG endpoints — use /v1/embeddings + your own logic if you need custom prompting.
Limits
| Limit | Free | Pro | Max | PAYG |
|---|---|---|---|---|
| KBs per user | 1 | 5 | 25 | unlimited |
| Docs per KB | 50 | 500 | 5000 | unlimited |
| File size | 10 MB | 50 MB | 50 MB | 50 MB |
| Top-K | 5 | 10 | 20 | 20 |