Architecture overview
The stack stays replaceable because the protocols stay standard. Every layer below speaks a documented public surface; you can swap any layer without disturbing the others.
The four layers
Section titled “The four layers”Layer 01 — Clients IDE agents · Companion UI · OpenAI / Anthropic SDKs ↓Layer 02 — Experience Companion ↓Layer 03 — Engine OdyssAI-X · Telemak ↓Layer 04 — Ground Apple Silicon · local engines · cloud providersLayer 01 — Clients. Anything that speaks OpenAI or Anthropic HTTP. Your IDE agent (Cline, Continue.dev, Claude Code, Codex). A cURL shell. The OpenAI Python SDK pointed at http://localhost:8000. This layer is not part of the stack — it is what the stack is built to serve.
Layer 02 — Experience. Companion. Web UI for chat, projects, memory, skills, MCP. The only part of the stack that talks directly to a human. Companion ships no model of its own; it pairs to a Layer 03 engine.
Layer 03 — Engine. Two siblings, same HTTP API, different runtime targets.
- OdyssAI-X — the cluster engine. Python + mlx-distributed + Docker + SSH. Orchestrates inference across 1–5 Macs, optionally over RDMA Thunderbolt 5.
- Telemak — the mono-Mac runtime. Native Swift on
mlx-swift-lm. One Mac, one daemon, one.appbundle in the menu bar.
Both expose POST /v1/chat/completions, POST /v1/messages, GET /v1/models, and a capability contract at /.well-known/inference-engine.json. Companion does not know which one is behind the address until it probes.
Layer 04 — Ground. Apple Silicon. The hardware MLX was written for. Telemak and OdyssAI-X both target Metal and the Apple Neural Engine through mlx-swift-lm and mlx respectively. Cloud providers — OpenAI, Anthropic, OpenRouter — are also at this layer, behind the same HTTP surface, treated as first-class.
Boundaries
Section titled “Boundaries”- Companion does not replace your IDE. It exposes a runtime your IDE (or any HTTP client) can consume. If you want to drive a coding agent from your terminal, point it at an OdyssAI-X or Telemak endpoint.
- OdyssAI-X is not an exo fork. It is a control plane on top of Apple MLX and mlx-distributed. The collective communication — JACCL over Thunderbolt 5 RDMA, ring over TCP — comes from Apple MLX directly.
- Telemak is not a fork of OdyssAI-X. It is a sibling runtime: native Swift on
mlx-swift-lm, mono-Mac, no Docker, no Python, no SSH. A Telemak can be enrolled in an OdyssAI-X cluster as a single-node provider. - The engine never owns memory. Memory lives in Companion. Personal, team, and project memory are Karpathy-style knowledge graphs compiled by an LLM worker, exposed to the engine only as a system-prompt prefix. The engine sees opaque tokens; the source of truth stays in Companion’s Postgres.
When to reach for which engine
Section titled “When to reach for which engine”| You have | Reach for | Typical models | Infra cost |
|---|---|---|---|
| One Mac Studio (96–512 GB) or MacBook Pro | Telemak | 30 B–80 B dense, MoE up to ~80 B | .app bundle, menu-bar, LaunchAgent |
| 2–5 Mac Studios with TB5 mesh | OdyssAI-X | 200 B–700 B frontier MoE | Docker orchestrator + SSH + MLX runners |
| Both | Both — Telemak nodes enrol in OdyssAI-X | mixed catalog under one Companion | Telemak + OdyssAI-X together |
| A cloud API key only | Companion paired to cloud | whatever the provider offers | none |
The 80/20 case is one Mac + Telemak + Companion. The frontier case is a 4-node TB5 mesh + OdyssAI-X + Telemak stations + Companion. You can grow from one to the other without changing your chat window, your history, or your memory.
What is local, what is cloud
Section titled “What is local, what is cloud”OdyssAI is local-first but not local-only. The default surface is local: prompts leave your LAN, weights stay on your SSD, conversations sit in your Postgres. Cloud providers are first-class citizens behind the same OpenAI/Anthropic-compatible surfaces — pair Companion to a cloud key on a Tuesday, back to a local cluster on a Wednesday, no migration, no data movement.
The choice belongs to the operator, not the framework.
The capability contract
Section titled “The capability contract”Every engine — OdyssAI-X, Telemak, Ollama, LM Studio, vLLM, the cloud providers — advertises its capabilities at /.well-known/inference-engine.json. Companion reads this endpoint during pairing to know which models support tools, vision, thinking, embeddings, and how to route requests.
{ "engine": "telemak", "version": "0.6.x", "capabilities": { "stream": true, "tools": true, "vision": false, "embeddings": true, "max_context": 32768, "session_cache": true, "openai_compat": "v1", "anthropic_compat": "v1" }, "models": [ /* per-model namespace with backend, nodes, tools, vision flags */ ]}This is the contract that makes the stack replaceable. Companion does not hard-code which engine is behind the address — it reads the contract and adapts. Full contract spec →.
The HTTP surface
Section titled “The HTTP surface”POST /v1/chat/completions and POST /v1/messages are the two endpoints that matter. Every engine in the stack speaks both. Companion uses OpenAI by default and Anthropic for Claude-style reasoning flows.
Three extensions on top of the standard schemas:
session_id— string, optional. Used by the engine to scope the KV prefix cache to a conversation. Companion attaches one per chat.enable_thinking— boolean, defaulttrueon reasoner models. Companion lets you toggle this per turn (cogwheel → Thinking).reasoning_effort—minimal/low/medium/high. Sets the budget for the reasoning block on models that ship it.minimalroughly halves the completion tokens on always-think models.
The usage block in OpenAI responses includes prompt_tokens_details.cached_tokens — the hit count on the prefix cache. Companion surfaces it as Cached: N tok (XX%).
What runs where, in one screen
Section titled “What runs where, in one screen”Companion (Némo) :3100 React + Hono + Postgres + nemo-memory │ │ HTTP (OpenAI / Anthropic) ▼OdyssAI-X orchestrator :8000 FastAPI + Docker ├── cluster "default" backend=ring → runner.py × N nodes ├── cluster "argo" backend=jaccl → runner.py × 4 nodes (TB5 RDMA) ├── cluster "telemak-max64" kind=telemak → Telemak single-Mac (:8003) └── alias "or:*" cloud passthrough → OpenRouter / Anthropic / OpenAICompanion sees one catalog. The orchestrator hides the topology.
Read next
Section titled “Read next”- The cluster → — what distributed inference buys you, what it costs, and the JACCL queue-pair trade-off.
- Inference modes → — tensor-parallel vs pipeline-parallel, when to use which.
- HTTP API → · Capability contract → — the surface clients consume.
- CoeOS → — the benchmark-composed virtual model · OdyRAG → — the knowledge-graph layer.
- Troubleshooting → · Cluster health → · Deploy →.
- Telemak → — the mono-Mac sibling runtime · Companion → — the chat client and Némo.
- HTTP API → — full endpoint reference.