Voice I/O for the chat panel (deferred) #56

Open
opened 2026-06-14 19:52:25 +00:00 by james · 0 comments
Owner

Layer voice input and output onto the chat panel. Voice was deferred from the first cut of the agent-interaction epic so the text surface could stabilise first; this ticket picks it up afterward.

Scope (revisit when starting)

  • STT: browser-native (Web Speech API) first cut. Allowance to swap to server-side Whisper later as a self-host-friendly alternative.
  • TTS: browser-native voice for the first cut.
  • UX: push-to-talk button in the chat panel. Transcript shown as the user speaks; submission happens on release.
  • Settings: voice on/off per device.
  • Out of scope for the first cut: continuous always-listening, voice activity detection, anything that requires server-side audio processing.

Acceptance criteria

  • A user can press and hold the mic button, speak, and see their words become the next chat message.
  • The agent's reply is read aloud through the browser's default voice when voice TTS is enabled in settings.
  • Voice can be turned off entirely from settings without affecting text chat.

Part of epic #47. Depends on the Built-in chat UI. Deferred — not blocking the epic's exit criteria.

Layer voice input and output onto the chat panel. Voice was deferred from the first cut of the agent-interaction epic so the text surface could stabilise first; this ticket picks it up afterward. ## Scope (revisit when starting) - STT: browser-native (Web Speech API) first cut. Allowance to swap to server-side Whisper later as a self-host-friendly alternative. - TTS: browser-native voice for the first cut. - UX: push-to-talk button in the chat panel. Transcript shown as the user speaks; submission happens on release. - Settings: voice on/off per device. - Out of scope for the first cut: continuous always-listening, voice activity detection, anything that requires server-side audio processing. ## Acceptance criteria - [ ] A user can press and hold the mic button, speak, and see their words become the next chat message. - [ ] The agent's reply is read aloud through the browser's default voice when voice TTS is enabled in settings. - [ ] Voice can be turned off entirely from settings without affecting text chat. Part of epic #47. Depends on the Built-in chat UI. **Deferred — not blocking the epic's exit criteria.**
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
james/carol#56
No description provided.