Welcome to Telemak

The son takes the helm when the cluster is at sea elsewhere.

Telemak is the mono-Mac runtime of the OdyssAI stack. A native Swift .app that lives in your menu bar, serves OpenAI and Anthropic APIs on http://localhost:8003, and keeps several MLX models co-loaded in wired memory.

If OdyssAI-X is the engine for 200 B+ models across a multi-Mac cluster, Telemak is the engine for 30–80 B models on a single Mac Studio or MacBook Pro. No Docker. No Python. No SSH. No orchestrator. One binary, one menu bar, one daemon.

What you get

Native Swift on mlx-swift-lm. No Python venv to maintain, no MLX to rebuild on every Xcode bump.
OpenAI + Anthropic side by side. /v1/chat/completions and /v1/messages both speak to the same loaded models.
Multi-model concurrent loading. A chat-MoE plus an embedder plus a small VLM can stay warm in wired memory at the same time. Telemak tracks per-model wired_limit_mb so the OS does not steal pages from under the GPU.
KV cache cross-turn. Conversations keep their KV prefix on disk (~/.telemak/sessions/) and LRU-evict when the budget is hit. The second turn of a 12 k-token conversation comes back roughly 8× faster than the first.
Menu-bar control. Start, stop, restart, live phase (prefill / decode / streaming / idle), tokens, tok/s, last error. No Dock icon, no notifications — Telemak stays out of your way.
LaunchAgent autostart. Survives reboots, restarts on crash, JSON logs daily-rotated under ~/.telemak/logs/.
Cluster enrolment. Declare a Telemak as kind=telemak in your OdyssAI-X dashboard and its models join the cluster catalog. The Telemak can also run standalone — one Mac, no orchestrator.

What Telemak is not

It is not a fork of OdyssAI-X. Same API surface, different runtime target.
It does not do distributed inference. One process, one machine.
It does not ship with a model catalog or a chat UI. You bring models (mlx-community/*), Companion or any other OpenAI/Anthropic client brings the chat.

Status

Shipping. Telemak runs in production on multiple M3 Ultra and M3 Max machines. Recorded reliability mark: 96 hours 44 minutes non-stop, 23 549 inferences, zero failed requests, with two 30 B models co-loaded (73.9 / 96 GB wired).

Mixed-quant models (body 6-bit + MoE gate 8-bit) require capability contract v0.6.33+.

Read first

Getting started → — install the .app, load your first model, hit /v1/chat/completions.
Architecture → — daemon + menu bar + LaunchAgent, and how they fit.
The menu bar → — every control, every status indicator.
Performance → — what to expect from each model class on real hardware.
Cluster enrolment → — pair Telemak with OdyssAI-X in 30 seconds.
Comparison → — Telemak vs Ollama, LM Studio, vLLM, exo, and the rest.