feat: voice I/O for the chat agent #361

Open
opened 2026-06-29 13:18:56 +00:00 by james · 0 comments
Owner

The deferred voice sub-ticket of epic #47 ("Text first; voice is a linked-but-deferred ticket inside this epic"). Layer speech input/output onto the now-shipped chat panel.

Scope (to be designed)

  • Voice input: dictate a message to the agent (speech-to-text) from the chat composer — a mic affordance that fills the existing text input, so it flows through the same streamConversationTurn path.
  • Voice output: optionally read assistant replies aloud (text-to-speech), respecting the <think> collapse (don't read reasoning).
  • Cross-platform: must work on the PWA (Web Speech API) and Android (Expo speech/STT) — Carol is one universal codebase. Pick an approach that degrades gracefully where a platform lacks support (the feature is additive; text always works).
  • Self-hoster/privacy note: prefer on-device/browser APIs over sending audio to a third party; document any external dependency.

Out of scope

  • Wake-word / always-listening. Real-time duplex voice.

Design first (likely a short ADR on the STT/TTS approach + platform support matrix), then implement. Part of epic #47.

The deferred voice sub-ticket of epic #47 ("Text first; voice is a linked-but-deferred ticket inside this epic"). Layer speech input/output onto the now-shipped chat panel. ## Scope (to be designed) - **Voice input**: dictate a message to the agent (speech-to-text) from the chat composer — a mic affordance that fills the existing text input, so it flows through the same `streamConversationTurn` path. - **Voice output**: optionally read assistant replies aloud (text-to-speech), respecting the `<think>` collapse (don't read reasoning). - **Cross-platform**: must work on the PWA (Web Speech API) and Android (Expo speech/STT) — Carol is one universal codebase. Pick an approach that degrades gracefully where a platform lacks support (the feature is additive; text always works). - Self-hoster/privacy note: prefer on-device/browser APIs over sending audio to a third party; document any external dependency. ## Out of scope - Wake-word / always-listening. Real-time duplex voice. Design first (likely a short ADR on the STT/TTS approach + platform support matrix), then implement. Part of epic #47.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
james/carol#361
No description provided.