siati.ai docs

API reference

Embeddings

Vector embeddings for semantic search, clustering, RAG.

Last updated: 2026-05-19

Embeddings

POST https://api.siati.ai/v1/embeddings

Convert text into 1024-dimensional vectors using BGE-M3 (multilingual). Use for semantic search, clustering, or as the indexing side of a RAG pipeline.

Request

bash
curl https://api.siati.ai/v1/embeddings \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": [
      "siati.ai è un provider svizzero di AI sovrana.",
      "Apertus è il modello LLM della Swiss AI Initiative."
    ]
  }'

Parameters

Param Type Required Description
model string Only bge-m3 for now.
input string | array Up to 32 strings per request.
encoding_format string float (default) or base64.

Response

json
{
  "object": "list",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, ...] },
    { "object": "embedding", "index": 1, "embedding": [0.0234, -0.0567, ...] }
  ],
  "model": "bge-m3",
  "usage": { "prompt_tokens": 24, "total_tokens": 24 }
}

Each embedding is a 1024-float array, L2-normalised. Use cosine distance for similarity.

SDK example

python
from openai import OpenAI
import os, numpy as np

client = OpenAI(
    base_url="https://api.siati.ai/v1",
    api_key=os.environ["SIATI_API_KEY"],
)

texts = ["Apertus is a Swiss LLM.", "Llama is a US model.", "Pasta is Italian."]
embs = client.embeddings.create(model="bge-m3", input=texts).data
vectors = np.array([e.embedding for e in embs])

# Cosine similarity matrix
sim = vectors @ vectors.T
print(sim)

Performance

Setup Latency (1 text, 100 tokens) Throughput (batch 32, 100 tok each)
BGE-M3 on L40S ~30 ms ~800 embeds/s

Embeddings are input-only billed — there's no "completion". Cost: see Pricing.

Tips

  • Normalize before storing: BGE-M3 returns L2-normalized vectors already, but if you mix in vectors from other sources, normalize first.
  • Chunk size matters: 256–512 tokens is the sweet spot for RAG. Longer chunks hurt retrieval precision.
  • Cosine, not Euclidean: cosine distance is what BGE-M3 is trained for.
  • Avoid mixing models: don't compare BGE-M3 vectors with OpenAI text-embedding-3 vectors. Different spaces.