Architecture overview

The stack stays replaceable because the protocols stay standard. Every layer below speaks a documented public surface; you can swap any layer without disturbing the others.

The four layers

Layer 01 — Clients      IDE agents · Companion UI · OpenAI / Anthropic SDKs
                                  ↓
Layer 02 — Experience   Companion
                                  ↓
Layer 03 — Engine       OdyssAI-X  ·  Telemak
                                  ↓
Layer 04 — Ground       Apple Silicon · local engines · cloud providers

Layer 01 — Clients. Anything that speaks OpenAI or Anthropic HTTP. Your IDE agent (Cline, Continue.dev, Claude Code, Codex). A cURL shell. The OpenAI Python SDK pointed at http://localhost:8000. This layer is not part of the stack — it is what the stack is built to serve.

Layer 02 — Experience. Companion. Web UI for chat, projects, memory, skills, MCP. The only part of the stack that talks directly to a human. Companion ships no model of its own; it pairs to a Layer 03 engine.

Layer 03 — Engine. Two siblings, same HTTP API, different runtime targets.

OdyssAI-X — the cluster engine. Python + mlx-distributed + Docker + SSH. Orchestrates inference across 1–5 Macs, optionally over RDMA Thunderbolt 5.
Telemak — the mono-Mac runtime. Native Swift on mlx-swift-lm. One Mac, one daemon, one .app bundle in the menu bar.

Both expose POST /v1/chat/completions, POST /v1/messages, GET /v1/models, and a capability contract at /.well-known/inference-engine.json. Companion does not know which one is behind the address until it probes.

Layer 04 — Ground. Apple Silicon. The hardware MLX was written for. Telemak and OdyssAI-X both target Metal and the Apple Neural Engine through mlx-swift-lm and mlx respectively. Cloud providers — OpenAI, Anthropic, OpenRouter — are also at this layer, behind the same HTTP surface, treated as first-class.

Boundaries

Companion does not replace your IDE. It exposes a runtime your IDE (or any HTTP client) can consume. If you want to drive a coding agent from your terminal, point it at an OdyssAI-X or Telemak endpoint.
OdyssAI-X is not an exo fork. It is a control plane on top of Apple MLX and mlx-distributed. The collective communication — JACCL over Thunderbolt 5 RDMA, ring over TCP — comes from Apple MLX directly.
Telemak is not a fork of OdyssAI-X. It is a sibling runtime: native Swift on mlx-swift-lm, mono-Mac, no Docker, no Python, no SSH. A Telemak can be enrolled in an OdyssAI-X cluster as a single-node provider.
The engine never owns memory. Memory lives in Companion. Personal, team, and project memory are Karpathy-style knowledge graphs compiled by an LLM worker, exposed to the engine only as a system-prompt prefix. The engine sees opaque tokens; the source of truth stays in Companion’s Postgres.

When to reach for which engine

You have	Reach for	Typical models	Infra cost
One Mac Studio (96–512 GB) or MacBook Pro	Telemak	30 B–80 B dense, MoE up to ~80 B	`.app` bundle, menu-bar, LaunchAgent
2–5 Mac Studios with TB5 mesh	OdyssAI-X	200 B–700 B frontier MoE	Docker orchestrator + SSH + MLX runners
Both	Both — Telemak nodes enrol in OdyssAI-X	mixed catalog under one Companion	Telemak + OdyssAI-X together
A cloud API key only	Companion paired to cloud	whatever the provider offers	none

The 80/20 case is one Mac + Telemak + Companion. The frontier case is a 4-node TB5 mesh + OdyssAI-X + Telemak stations + Companion. You can grow from one to the other without changing your chat window, your history, or your memory.

What is local, what is cloud

OdyssAI is local-first but not local-only. The default surface is local: prompts leave your LAN, weights stay on your SSD, conversations sit in your Postgres. Cloud providers are first-class citizens behind the same OpenAI/Anthropic-compatible surfaces — pair Companion to a cloud key on a Tuesday, back to a local cluster on a Wednesday, no migration, no data movement.

The choice belongs to the operator, not the framework.

The capability contract

Every engine — OdyssAI-X, Telemak, Ollama, LM Studio, vLLM, the cloud providers — advertises its capabilities at /.well-known/inference-engine.json. Companion reads this endpoint during pairing to know which models support tools, vision, thinking, embeddings, and how to route requests.

{
  "engine": "telemak",
  "version": "0.6.x",
  "capabilities": {
    "stream": true,
    "tools": true,
    "vision": false,
    "embeddings": true,
    "max_context": 32768,
    "session_cache": true,
    "openai_compat": "v1",
    "anthropic_compat": "v1"
  },
  "models": [ /* per-model namespace with backend, nodes, tools, vision flags */ ]
}

This is the contract that makes the stack replaceable. Companion does not hard-code which engine is behind the address — it reads the contract and adapts. Full contract spec →.

The HTTP surface

POST /v1/chat/completions and POST /v1/messages are the two endpoints that matter. Every engine in the stack speaks both. Companion uses OpenAI by default and Anthropic for Claude-style reasoning flows.

Three extensions on top of the standard schemas:

session_id — string, optional. Used by the engine to scope the KV prefix cache to a conversation. Companion attaches one per chat.
enable_thinking — boolean, default true on reasoner models. Companion lets you toggle this per turn (cogwheel → Thinking).
reasoning_effort — minimal / low / medium / high. Sets the budget for the reasoning block on models that ship it. minimal roughly halves the completion tokens on always-think models.

The usage block in OpenAI responses includes prompt_tokens_details.cached_tokens — the hit count on the prefix cache. Companion surfaces it as Cached: N tok (XX%).

What runs where, in one screen

Companion (Némo)                        :3100   React + Hono + Postgres + nemo-memory
   │
   │  HTTP (OpenAI / Anthropic)
   ▼
OdyssAI-X orchestrator                  :8000   FastAPI + Docker
   ├── cluster "default"    backend=ring        → runner.py × N nodes
   ├── cluster "argo"       backend=jaccl       → runner.py × 4 nodes (TB5 RDMA)
   ├── cluster "telemak-max64"  kind=telemak    → Telemak single-Mac (:8003)
   └── alias "or:*"         cloud passthrough  → OpenRouter / Anthropic / OpenAI

Companion sees one catalog. The orchestrator hides the topology.