Skip to content

Architecture overview

The stack stays replaceable because the protocols stay standard. Every layer below speaks a documented public surface; you can swap any layer without disturbing the others.

Layer 01 — Clients IDE agents · Companion UI · OpenAI / Anthropic SDKs
Layer 02 — Experience Companion
Layer 03 — Engine OdyssAI-X · Telemak
Layer 04 — Ground Apple Silicon · local engines · cloud providers

Layer 01 — Clients. Anything that speaks OpenAI or Anthropic HTTP. Your IDE agent (Cline, Continue.dev, Claude Code, Codex). A cURL shell. The OpenAI Python SDK pointed at http://localhost:8000. This layer is not part of the stack — it is what the stack is built to serve.

Layer 02 — Experience. Companion. Web UI for chat, projects, memory, skills, MCP. The only part of the stack that talks directly to a human. Companion ships no model of its own; it pairs to a Layer 03 engine.

Layer 03 — Engine. Two siblings, same HTTP API, different runtime targets.

  • OdyssAI-X — the cluster engine. Python + mlx-distributed + Docker + SSH. Orchestrates inference across 1–5 Macs, optionally over RDMA Thunderbolt 5.
  • Telemak — the mono-Mac runtime. Native Swift on mlx-swift-lm. One Mac, one daemon, one .app bundle in the menu bar.

Both expose POST /v1/chat/completions, POST /v1/messages, GET /v1/models, and a capability contract at /.well-known/inference-engine.json. Companion does not know which one is behind the address until it probes.

Layer 04 — Ground. Apple Silicon. The hardware MLX was written for. Telemak and OdyssAI-X both target Metal and the Apple Neural Engine through mlx-swift-lm and mlx respectively. Cloud providers — OpenAI, Anthropic, OpenRouter — are also at this layer, behind the same HTTP surface, treated as first-class.

  • Companion does not replace your IDE. It exposes a runtime your IDE (or any HTTP client) can consume. If you want to drive a coding agent from your terminal, point it at an OdyssAI-X or Telemak endpoint.
  • OdyssAI-X is not an exo fork. It is a control plane on top of Apple MLX and mlx-distributed. The collective communication — JACCL over Thunderbolt 5 RDMA, ring over TCP — comes from Apple MLX directly.
  • Telemak is not a fork of OdyssAI-X. It is a sibling runtime: native Swift on mlx-swift-lm, mono-Mac, no Docker, no Python, no SSH. A Telemak can be enrolled in an OdyssAI-X cluster as a single-node provider.
  • The engine never owns memory. Memory lives in Companion. Personal, team, and project memory are Karpathy-style knowledge graphs compiled by an LLM worker, exposed to the engine only as a system-prompt prefix. The engine sees opaque tokens; the source of truth stays in Companion’s Postgres.
You haveReach forTypical modelsInfra cost
One Mac Studio (96–512 GB) or MacBook ProTelemak30 B–80 B dense, MoE up to ~80 B.app bundle, menu-bar, LaunchAgent
2–5 Mac Studios with TB5 meshOdyssAI-X200 B–700 B frontier MoEDocker orchestrator + SSH + MLX runners
BothBoth — Telemak nodes enrol in OdyssAI-Xmixed catalog under one CompanionTelemak + OdyssAI-X together
A cloud API key onlyCompanion paired to cloudwhatever the provider offersnone

The 80/20 case is one Mac + Telemak + Companion. The frontier case is a 4-node TB5 mesh + OdyssAI-X + Telemak stations + Companion. You can grow from one to the other without changing your chat window, your history, or your memory.

OdyssAI is local-first but not local-only. The default surface is local: prompts leave your LAN, weights stay on your SSD, conversations sit in your Postgres. Cloud providers are first-class citizens behind the same OpenAI/Anthropic-compatible surfaces — pair Companion to a cloud key on a Tuesday, back to a local cluster on a Wednesday, no migration, no data movement.

The choice belongs to the operator, not the framework.

Every engine — OdyssAI-X, Telemak, Ollama, LM Studio, vLLM, the cloud providers — advertises its capabilities at /.well-known/inference-engine.json. Companion reads this endpoint during pairing to know which models support tools, vision, thinking, embeddings, and how to route requests.

{
"engine": "telemak",
"version": "0.6.x",
"capabilities": {
"stream": true,
"tools": true,
"vision": false,
"embeddings": true,
"max_context": 32768,
"session_cache": true,
"openai_compat": "v1",
"anthropic_compat": "v1"
},
"models": [ /* per-model namespace with backend, nodes, tools, vision flags */ ]
}

This is the contract that makes the stack replaceable. Companion does not hard-code which engine is behind the address — it reads the contract and adapts. Full contract spec →.

POST /v1/chat/completions and POST /v1/messages are the two endpoints that matter. Every engine in the stack speaks both. Companion uses OpenAI by default and Anthropic for Claude-style reasoning flows.

Three extensions on top of the standard schemas:

  • session_id — string, optional. Used by the engine to scope the KV prefix cache to a conversation. Companion attaches one per chat.
  • enable_thinking — boolean, default true on reasoner models. Companion lets you toggle this per turn (cogwheel → Thinking).
  • reasoning_effortminimal / low / medium / high. Sets the budget for the reasoning block on models that ship it. minimal roughly halves the completion tokens on always-think models.

The usage block in OpenAI responses includes prompt_tokens_details.cached_tokens — the hit count on the prefix cache. Companion surfaces it as Cached: N tok (XX%).

Companion (Némo) :3100 React + Hono + Postgres + nemo-memory
│ HTTP (OpenAI / Anthropic)
OdyssAI-X orchestrator :8000 FastAPI + Docker
├── cluster "default" backend=ring → runner.py × N nodes
├── cluster "argo" backend=jaccl → runner.py × 4 nodes (TB5 RDMA)
├── cluster "telemak-max64" kind=telemak → Telemak single-Mac (:8003)
└── alias "or:*" cloud passthrough → OpenRouter / Anthropic / OpenAI

Companion sees one catalog. The orchestrator hides the topology.