Skip to content

The capability contract

The contract is what makes the layers swappable. A client reads it and adapts — it never hard-codes which engine is behind the address.

Every engine in the stack — OdyssAI-X, Telemak, and even third parties like Ollama, LM Studio or vLLM behind a shim — advertises what it can do at a single well-known URL:

GET /.well-known/inference-engine.json

Companion reads this during pairing to learn which models support tools, vision, thinking and embeddings, and how to route requests. Because the discovery is data, not code, you can put a different engine behind the same address and the client adapts on the next probe.

{
"engine": "telemak",
"version": "0.6.x",
"capabilities": {
"stream": true,
"tools": true,
"vision": false,
"embeddings": true,
"max_context": 32768,
"session_cache": true,
"openai_compat": "v1",
"anthropic_compat": "v1"
},
"models": [ /* per-model namespace with backend, nodes, tools, vision flags */ ]
}
FieldMeaning
engineWhich runtime answers — odyssai-x, telemak, …
versionThe engine version. Some features gate on it (e.g. mixed-quant models need Telemak v0.6.33+).
capabilitiesThe engine-wide defaults (below).
modelsPer-model capability overrides — a model can disable tools or vision even when the engine supports them.
FlagWhat it tells the client
streamSSE streaming is available.
toolsTool / function calling is supported.
visionImage inputs are accepted.
embeddingsAn embeddings endpoint is served.
max_contextThe context window in tokens.
session_cacheThe engine keeps a KV prefix cache scoped by session_id.
openai_compat / anthropic_compatWhich API dialects, and at which version.

The engine-wide flags are defaults; the models array refines them. A cluster serving a coder MoE and a small VLM advertises vision: true only on the VLM entry, with each model’s backend (local pool, Telemak proxy, cloud alias) and node layout. This is how a client builds an accurate model picker without probing each model by hand.

  • Replaceable layers. Swap Telemak for an OdyssAI-X cluster behind the same address; the client re-reads the contract and adjusts. No client redeploy.
  • Honest routing. Companion won’t offer a tool call to a model whose contract says tools: false, and won’t toggle thinking on a model that doesn’t ship it.
  • One picker, many engines. Pair several engines; each contract feeds one unified catalog with correct per-model capabilities.