HTTP API
Two endpoints do the real work; everything else is convenience. If your client speaks OpenAI or Anthropic, it already speaks OdyssAI-X.
OdyssAI-X exposes a public inference surface (/v1/*, always open) and an admin
surface (/admin/*, open on a trusted LAN, token-gated when you choose). Telemak
serves the same /v1/* surface from a single Mac.
The two endpoints that matter
Section titled “The two endpoints that matter”POST /v1/chat/completions OpenAI dialectPOST /v1/messages Anthropic dialect (+ /v1/messages/count_tokens)Every engine in the stack speaks both. Clients use OpenAI by default and Anthropic for Claude-style flows (Claude Code, the Anthropic SDK). The schemas are the standard ones — point an existing SDK at the base URL and it works:
export OPENAI_BASE_URL="http://<server>:8000/v1"export OPENAI_API_KEY="dummy" # no key required on a LANListing models
Section titled “Listing models”GET /v1/models currently servable modelsGET /v1/models?include_unloaded=true + on-disk inventory (x_odyssai.ready=false)By default only servable models are listed: loaded local pools, enrolled
Telemak clusters, published cloud aliases, and CoeOS when enabled. Each entry
carries an x_odyssai block with the real backend, the concrete model behind an
alias, and per-model capabilities.
Three extensions
Section titled “Three extensions”On top of the standard schemas, OdyssAI-X reads three optional fields:
| Field | Type | Effect |
|---|---|---|
session_id | string | Scopes the KV prefix cache to a conversation. Reuse the same id across turns and the shared prefix is prefilled once — big TTFT win. Clients attach one per chat. |
enable_thinking | bool | Toggles the reasoning block on reasoner models (default true on those that ship it). |
reasoning_effort | minimal / low / medium / high | Budget for the reasoning block. minimal roughly halves completion tokens on always-think models. |
The usage block in OpenAI responses adds
prompt_tokens_details.cached_tokens — the prefix-cache hit count. Clients
surface it (Companion shows Cached: N tok (XX%)).
Aliases and routing
Section titled “Aliases and routing”The model field accepts several kinds of id, all resolved by the orchestrator:
model value | Routes to |
|---|---|
a cluster name (e.g. default, argo) | the loaded pool on that cluster |
a Telemak cluster id (e.g. telecode) | the enrolled Telemak, over HTTP proxy |
a cloud alias (e.g. or:claude-haiku) | the configured cloud provider |
CoeOS | the benchmark-composed router → the best model per skill |
The caller doesn’t need to know the topology — it picks a published id, the orchestrator hides the rest.
Admin endpoints
Section titled “Admin endpoints”/admin/* controls the cluster. A few you’ll actually use:
GET /admin/clusters # registered clusters + statusPOST /admin/<cluster>/load # load a model {"model": "...", "sharding"?: "pipeline"}POST /admin/<cluster>/unload # free the poolGET /admin/settings PUT /admin/settings # server-wide defaultsPOST /admin/sync/rsync # push a model from one node to othersGET /health # {"status":"idle|busy", "version": "..."}/admin/* is open by default (trusted-LAN, single operator). Set
ODYSSAI_X_ADMIN_TOKEN to require Authorization: Bearer <token> if you expose
the engine beyond your LAN; /v1/* stays public regardless.
Cloud passthrough
Section titled “Cloud passthrough”Cloud providers are first-class. Add one in the dashboard (Settings → Cloud
providers → paste an OpenRouter / Anthropic / OpenAI key) and aliases like
or:claude-haiku appear in /v1/models instantly, callable through the same
/v1/chat/completions. There is no LiteLLM to install — it exists only as a
legacy fallback rail.
Read next
Section titled “Read next”- The capability contract — how a client discovers all of the above.
- CoeOS — the
model: "CoeOS"router and its per-skill axes. - Inference modes — the
shardingoption on load. - Troubleshooting — empty
/v1/models, thinking defaults, model-update verbs.