Skip to content

feat: echo STT transcripts to thread before agent reply#571

Open
dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
dogzzdogzz:feat/stt-transcript-echo
Open

feat: echo STT transcripts to thread before agent reply#571
dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
dogzzdogzz:feat/stt-transcript-echo

Conversation

@dogzzdogzz
Copy link
Copy Markdown
Contributor

Summary

When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.

  • One thread message per user message: > 🎤 <transcript> per clip.
  • Failure → > 🎤 (transcription failed) line + ⚠️ reaction on the user's original message.
  • Opt-out via [stt] echo_transcript = false (default true, mirrored as stt.echoTranscript in Helm values).

Closes #570.

Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560

Architecture

stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig) is the platform-agnostic helper. Discord (`src/discord.rs`) and Slack (`src/slack.rs`) collect a `Vec` while iterating audio attachments and call the helper before the agent dispatch. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.

Files changed

  • `src/config.rs` — `SttConfig.echo_transcript: bool` (default `true`).
  • `src/stt.rs` — `EchoEntry` enum, `format_echo_message`, `post_echo` with `MockAdapter`-driven tests.
  • `src/discord.rs`, `src/slack.rs` — wire echo into the audio attachment loop, call `post_echo` before `router.handle_message`.
  • `charts/openab/values.yaml`, `charts/openab/templates/configmap.yaml` — expose `echoTranscript` (default `true`, `hasKey` guard preserves the default while distinguishing unset vs. explicit `false`).
  • `docs/stt.md`, `docs/config-reference.md` — document `echo_transcript`.
  • `docs/superpowers/specs/` and `docs/superpowers/plans/` — design spec + TDD-style implementation plan that drove this work.

Test plan

  • `cargo test --bin openab` — 133/133 pass (10 in `stt::tests` cover format, post_echo success, failure, mixed, disabled config, empty entries).
  • `cargo clippy --all-targets -- -D warnings` — clean.
  • `helm lint charts/openab` — clean.
  • `helm template ...` with default values renders `echo_transcript = true`; with `--set agents.kiro.stt.echoTranscript=false` renders `echo_transcript = false`.
  • Manual smoke test: send a voice message in Discord — verify the bot posts `> 🎤 ` before the agent's reply.
  • Manual smoke test: same in Slack.
  • Manual smoke test: simulate STT failure (e.g. revoke API key briefly or attach an unsupported file) — verify the failure line + ⚠️ reaction.

Out of scope / follow-ups

  • LINE / Telegram / Teams via gateway — those need audio plumbing in the gateway protocol first. The helper signature accommodates them when that work lands.
  • Multi-clip ordering: `extra_blocks.insert(0, …)` reverses transcript order in the agent prompt while `echo_entries.push(…)` preserves upload order. Pre-existing in the agent-prompt path; out of scope for this PR.

🤖 Generated with Claude Code

@dogzzdogzz dogzzdogzz requested a review from thepagent as a code owner April 26, 2026 03:18
@github-actions github-actions Bot added the pending-screening PR awaiting automated screening label Apr 26, 2026
@dogzzdogzz dogzzdogzz force-pushed the feat/stt-transcript-echo branch 2 times, most recently from ee10184 to 7f74166 Compare April 26, 2026 03:34
When STT transcribes a voice message, optionally post the transcript back
to the thread (no mentions) before the agent reply so users can verify what
was heard. Default is OFF — opt in via [stt] echo_transcript = true.

- New config: [stt] echo_transcript (default false, opt-in)
- New helper: stt::post_echo with platform-agnostic ChatAdapter handle —
  future LINE/Telegram/Teams adapters get echo for free
- Format: > 🎤 <transcript> per clip, all in one thread message
- Failure: > 🎤 (transcription failed) line + ⚠️ reaction on the user msg
- Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap
- Docs: docs/stt.md and docs/config-reference.md updated

Rebased on top of openabdev#567 (gateway config rendering).

Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.
@dogzzdogzz dogzzdogzz force-pushed the feat/stt-transcript-echo branch from 7f74166 to 6bc70f6 Compare April 26, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-screening PR awaiting automated screening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

STT: echo transcript to thread before agent reply (Discord + Slack)

1 participant