API reference
Embeddings
Vector embeddings for semantic search, clustering, RAG.
Last updated: 2026-05-19
Embeddings
POST https://api.siati.ai/v1/embeddings
Convert text into 1024-dimensional vectors using BGE-M3 (multilingual). Use for semantic search, clustering, or as the indexing side of a RAG pipeline.
Request
curl https://api.siati.ai/v1/embeddings \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-m3",
"input": [
"siati.ai è un provider svizzero di AI sovrana.",
"Apertus è il modello LLM della Swiss AI Initiative."
]
}'
Parameters
| Param | Type | Required | Description |
|---|---|---|---|
model |
string | ✓ | Only bge-m3 for now. |
input |
string | array |
✓ | Up to 32 strings per request. |
encoding_format |
string | – | float (default) or base64. |
Response
{
"object": "list",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, ...] },
{ "object": "embedding", "index": 1, "embedding": [0.0234, -0.0567, ...] }
],
"model": "bge-m3",
"usage": { "prompt_tokens": 24, "total_tokens": 24 }
}
Each embedding is a 1024-float array, L2-normalised. Use cosine distance for similarity.
SDK example
from openai import OpenAI
import os, numpy as np
client = OpenAI(
base_url="https://api.siati.ai/v1",
api_key=os.environ["SIATI_API_KEY"],
)
texts = ["Apertus is a Swiss LLM.", "Llama is a US model.", "Pasta is Italian."]
embs = client.embeddings.create(model="bge-m3", input=texts).data
vectors = np.array([e.embedding for e in embs])
# Cosine similarity matrix
sim = vectors @ vectors.T
print(sim)
Performance
| Setup | Latency (1 text, 100 tokens) | Throughput (batch 32, 100 tok each) |
|---|---|---|
| BGE-M3 on L40S | ~30 ms | ~800 embeds/s |
Embeddings are input-only billed — there's no "completion". Cost: see Pricing.
Tips
- Normalize before storing: BGE-M3 returns L2-normalized vectors already, but if you mix in vectors from other sources, normalize first.
- Chunk size matters: 256–512 tokens is the sweet spot for RAG. Longer chunks hurt retrieval precision.
- Cosine, not Euclidean: cosine distance is what BGE-M3 is trained for.
- Avoid mixing models: don't compare BGE-M3 vectors with OpenAI
text-embedding-3vectors. Different spaces.