Telemak — getting started

One .app, one menu-bar icon, one port. No Docker, no Python, no SSH.

From a fresh Mac to a first inference in about five minutes.

1. Install the app

Download Telemak-<version>.dmg from the releases page, open it, and drag Telemak.app to Applications.

Telemak is not signed with an Apple Developer ID (distributed outside the App Store), so the first launch is blocked by Gatekeeper. Unblock it once:

No terminal: open it, dismiss the warning, then System Settings → Privacy & Security → Open Anyway.
One line: xattr -dr com.apple.quarantine /Applications/Telemak.app

Open the app. A small icon appears in your menu bar — no Dock icon, on purpose. Telemak installs a LaunchAgent that survives reboots and restarts on crash, and starts serving on http://localhost:8003.

2. Load a model

Click the menu-bar icon → Load model → pick one of the curated MLX models. The dialog tells you which models fit your RAM before you commit:

Your Mac	Comfortable model class
64 GB	a 35 B MoE 8-bit (e.g. Qwen3.6-35B-A3B)
96–128 GB	a 70–80 B MoE 8-bit
256 GB	a 122 B MoE 8-bit
512 GB	a 200 B+ MoE (mixed-quant)

Loading takes 10–60 seconds depending on the model and your SSD. When the status flips from loading to idle, the model is warm. You can keep several models co-loaded (a chat MoE + an embedder + a small VLM) — Telemak tracks each model’s wired-memory budget so the OS doesn’t steal pages from the GPU.

3. Sanity-check

curl http://localhost:8003/v1/models

You should see your loaded model. Then a first completion:

curl http://localhost:8003/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<your-model-id>",
    "messages": [{"role": "user", "content": "Hello, sea."}]
  }'

Telemak speaks both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) against the same loaded models, so Claude-style clients work too.

4. Add a chat window (optional)

Telemak is just the engine — it ships no chat UI. To get conversations, history, memory and projects on top, point Companion at it:

Install Companion (Companion getting started).
Companion → Settings → Infrastructure → Engine → add http://localhost:8003 (or http://<telemak-ip>:8003 from another machine) → Test endpoint.
Companion reads Telemak’s capability contract and loads its model catalog. You now chat through the UI, with KV cache surviving across turns of the same conversation.

5. Point an IDE agent at it (optional)

Any OpenAI- or Anthropic-compatible coding agent (Cline, Continue.dev, Claude Code, Codex) can drive Telemak directly:

export OPENAI_BASE_URL="http://localhost:8003/v1"
export OPENAI_API_KEY="dummy"          # no key required on a LAN

You’re running

Telemak keeps the model warm in wired memory, the KV cache survives across turns, and the daemon restarts itself after a reboot or crash. From the menu bar you can watch the live phase (prefill / decode / streaming / idle), tokens/s, and the last error.