Skip to content

Voice & Talk

Hold a key, speak, release. The transcript lands in the composer for you to check before it goes anywhere.

Companion has two voice surfaces — one shipping, one coming.

Hold Space anywhere outside a text input to dictate; release to send the transcript to the composer.

  • Fully client-side. Transcription uses your browser’s Web Speech API — no audio ever leaves your device. The text drops into the input bar, not the conversation, so you can edit it before pressing Enter. (Mishears happen; a one-shot “you said this, sending it” flow is more frustrating than the half-second it saves.)
  • Visual cue. While Space is held, the input shows a red dot + “Listening…”.
  • Language follows your browser’s speech-recognition locale (usually your OS locale). Set the browser language to the one you speak.

Space inserts a literal space instead of recording? Your focus is inside a text input — click anywhere on the message area first. The dot won’t appear while a typeable element is focused.

Auto-speak — assistant replies streamed through TTS, plus a full-screen Talk surface for hands-free conversation — is on the roadmap. The speaker icon and the 🎙 Talk button are reserved; the per-message Listen / Stop / Save controls appear once the backend ships. Push-to-talk (above) works today and is independent of this.

There’s also a Voice Live add-on (full-duplex voice via Gemini Live, requires a Gemini key) — a separate, key-gated path from the always-on, browser-side push-to-talk.

Where it runs
Push-to-talk transcription100% in your browser — no audio leaves the device
Assistant textthrough the inference engine, like any chat
  • Browser asks for the mic on every reload — some browsers don’t persist Web Speech permission. Pin Companion as a PWA, or grant the site permanent mic permission in browser settings.
  • Transcribes empty / wrong language — set the browser language to the one you’re speaking; Web Speech follows it.