feat: voice I/O for the chat agent #361

New issue

Open

opened 2026-06-29 13:18:56 +00:00 by james · 0 comments

james commented

2026-06-29 13:18:56 +00:00

Owner

The deferred voice sub-ticket of epic #47 ("Text first; voice is a linked-but-deferred ticket inside this epic"). Layer speech input/output onto the now-shipped chat panel.

Scope (to be designed)

Voice input: dictate a message to the agent (speech-to-text) from the chat composer — a mic affordance that fills the existing text input, so it flows through the same streamConversationTurn path.
Voice output: optionally read assistant replies aloud (text-to-speech), respecting the <think> collapse (don't read reasoning).
Cross-platform: must work on the PWA (Web Speech API) and Android (Expo speech/STT) — Carol is one universal codebase. Pick an approach that degrades gracefully where a platform lacks support (the feature is additive; text always works).
Self-hoster/privacy note: prefer on-device/browser APIs over sending audio to a third party; document any external dependency.

Out of scope

Wake-word / always-listening. Real-time duplex voice.

Design first (likely a short ADR on the STT/TTS approach + platform support matrix), then implement. Part of epic #47.

The deferred voice sub-ticket of epic #47 ("Text first; voice is a linked-but-deferred ticket inside this epic"). Layer speech input/output onto the now-shipped chat panel. ## Scope (to be designed) - **Voice input**: dictate a message to the agent (speech-to-text) from the chat composer — a mic affordance that fills the existing text input, so it flows through the same `streamConversationTurn` path. - **Voice output**: optionally read assistant replies aloud (text-to-speech), respecting the `<think>` collapse (don't read reasoning). - **Cross-platform**: must work on the PWA (Web Speech API) and Android (Expo speech/STT) — Carol is one universal codebase. Pick an approach that degrades gracefully where a platform lacks support (the feature is additive; text always works). - Self-hoster/privacy note: prefer on-device/browser APIs over sending audio to a third party; document any external dependency. ## Out of scope - Wake-word / always-listening. Real-time duplex voice. Design first (likely a short ADR on the STT/TTS approach + platform support matrix), then implement. Part of epic #47.