Skip to content

Welcome to Telemak

The son takes the helm when the cluster is at sea elsewhere.

Telemak is the mono-Mac runtime of the OdyssAI stack. A native Swift .app that lives in your menu bar, serves OpenAI and Anthropic APIs on http://localhost:8003, and keeps several MLX models co-loaded in wired memory.

If OdyssAI-X is the engine for 200 B+ models across a multi-Mac cluster, Telemak is the engine for 30–80 B models on a single Mac Studio or MacBook Pro. No Docker. No Python. No SSH. No orchestrator. One binary, one menu bar, one daemon.

  • Native Swift on mlx-swift-lm. No Python venv to maintain, no MLX to rebuild on every Xcode bump.
  • OpenAI + Anthropic side by side. /v1/chat/completions and /v1/messages both speak to the same loaded models.
  • Multi-model concurrent loading. A chat-MoE plus an embedder plus a small VLM can stay warm in wired memory at the same time. Telemak tracks per-model wired_limit_mb so the OS does not steal pages from under the GPU.
  • KV cache cross-turn. Conversations keep their KV prefix on disk (~/.telemak/sessions/) and LRU-evict when the budget is hit. The second turn of a 12 k-token conversation comes back roughly 8× faster than the first.
  • Menu-bar control. Start, stop, restart, live phase (prefill / decode / streaming / idle), tokens, tok/s, last error. No Dock icon, no notifications — Telemak stays out of your way.
  • LaunchAgent autostart. Survives reboots, restarts on crash, JSON logs daily-rotated under ~/.telemak/logs/.
  • Cluster enrolment. Declare a Telemak as kind=telemak in your OdyssAI-X dashboard and its models join the cluster catalog. The Telemak can also run standalone — one Mac, no orchestrator.
  • It is not a fork of OdyssAI-X. Same API surface, different runtime target.
  • It does not do distributed inference. One process, one machine.
  • It does not ship with a model catalog or a chat UI. You bring models (mlx-community/*), Companion or any other OpenAI/Anthropic client brings the chat.

Shipping. Telemak runs in production on multiple M3 Ultra and M3 Max machines. Recorded reliability mark: 96 hours 44 minutes non-stop, 23 549 inferences, zero failed requests, with two 30 B models co-loaded (73.9 / 96 GB wired).

Mixed-quant models (body 6-bit + MoE gate 8-bit) require capability contract v0.6.33+.