Skip to content

Telemak — getting started

One .app, one menu-bar icon, one port. No Docker, no Python, no SSH.

From a fresh Mac to a first inference in about five minutes.

Download Telemak-<version>.dmg from the releases page, open it, and drag Telemak.app to Applications.

Telemak is not signed with an Apple Developer ID (distributed outside the App Store), so the first launch is blocked by Gatekeeper. Unblock it once:

  • No terminal: open it, dismiss the warning, then System Settings → Privacy & Security → Open Anyway.
  • One line: xattr -dr com.apple.quarantine /Applications/Telemak.app

Open the app. A small icon appears in your menu bar — no Dock icon, on purpose. Telemak installs a LaunchAgent that survives reboots and restarts on crash, and starts serving on http://localhost:8003.

Click the menu-bar icon → Load model → pick one of the curated MLX models. The dialog tells you which models fit your RAM before you commit:

Your MacComfortable model class
64 GBa 35 B MoE 8-bit (e.g. Qwen3.6-35B-A3B)
96–128 GBa 70–80 B MoE 8-bit
256 GBa 122 B MoE 8-bit
512 GBa 200 B+ MoE (mixed-quant)

Loading takes 10–60 seconds depending on the model and your SSD. When the status flips from loading to idle, the model is warm. You can keep several models co-loaded (a chat MoE + an embedder + a small VLM) — Telemak tracks each model’s wired-memory budget so the OS doesn’t steal pages from the GPU.

Terminal window
curl http://localhost:8003/v1/models

You should see your loaded model. Then a first completion:

Terminal window
curl http://localhost:8003/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<your-model-id>",
"messages": [{"role": "user", "content": "Hello, sea."}]
}'

Telemak speaks both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) against the same loaded models, so Claude-style clients work too.

Telemak is just the engine — it ships no chat UI. To get conversations, history, memory and projects on top, point Companion at it:

  1. Install Companion (Companion getting started).
  2. Companion → Settings → Infrastructure → Engine → add http://localhost:8003 (or http://<telemak-ip>:8003 from another machine) → Test endpoint.
  3. Companion reads Telemak’s capability contract and loads its model catalog. You now chat through the UI, with KV cache surviving across turns of the same conversation.

Any OpenAI- or Anthropic-compatible coding agent (Cline, Continue.dev, Claude Code, Codex) can drive Telemak directly:

Terminal window
export OPENAI_BASE_URL="http://localhost:8003/v1"
export OPENAI_API_KEY="dummy" # no key required on a LAN

Telemak keeps the model warm in wired memory, the KV cache survives across turns, and the daemon restarts itself after a reboot or crash. From the menu bar you can watch the live phase (prefill / decode / streaming / idle), tokens/s, and the last error.

  • The menu bar — every control and status indicator.
  • Architecture — the daemon, the menu bar, the LaunchAgent.
  • Performance — what to expect from each model class on real hardware.
  • Cluster enrolment — add this Telemak to an OdyssAI-X cluster in 30 seconds.