Vibe Speech (WIP)

Cross-platform, local-first voice helper that listens to the mic, runs Whisper locally, and feeds the transcript to a local LLM assistant. The assistant replies with your configured personality and the response is spoken aloud via xsst2. Designed for macOS and Windows, with a fallback automation layer that works anywhere pyautogui does.

Current status

Push-to-talk hotkey with tail padding; buffers while held, then transcribes once and routes the transcript to an LLM assistant.
Faster-Whisper backend (configurable model; defaults to large-v3-turbo unless you change it). Beam size, language, and an optional initial prompt are configurable.
Optional rewriter (Ollama or llama.cpp) for grammar polish before the assistant sees the text.
Assistant replies are generated by a local/Ollama LLM with a customizable personality and spoken through xsst2.
Assistant prompts can include a configurable window of recent conversation history (assistant.history_length).
Spinner/colored timing logs so you can see when transcription/rewriting/assistant work is running.

Architecture (plain text)

Mic -> AudioCapture (chunk/tail) -> WhisperEngine (transcribe)
    -> Processor (cleanup/correct + optional Rewriter)
    -> Assistant (LLM; uses conversation history window)
    -> SpeechSynthesizer (TTS) + Automation (typing)

Project layout

src/vibe_speech/cli.py – entrypoint (vibe-speech script) with serve and doctor.
src/vibe_speech/config.py – config models and loader.
src/vibe_speech/runtime.py – audio capture, hotkey handling, buffering, transcription, output, and logging.
src/vibe_speech/whisper_engine.py – Whisper wrapper (faster-whisper).
src/vibe_speech/automation.py – text output automation (pyautogui).
src/vibe_speech/processor.py – processing modes (raw, cleanup, optional correction/rewriter).
src/vibe_speech/rewriter.py – optional grammar rewriter (Ollama/local llama.cpp).
config.sample.yaml – defaults for local development.

Quick start

Python 3.11+, ffmpeg on PATH.
Install: python -m venv .venv && source .venv/bin/activate && pip install -e .
Copy and edit config: cp config.sample.yaml config.yaml
- Set audio.device_name to your mic.
- Choose a Whisper model (whisper.model_size), beam size, initial_prompt if desired.
- Set whisper.remote_url to offload transcription to your remote /transcribe service (leave empty to use local Whisper).
- Set the assistant provider/model/personality/history length in assistant.*; adjust speech.* if your xsst2 path or args differ.
- Enable/disable the optional rewriter as needed.

Prompt shape (with history)

assistant.system_prompt
Personality: <assistant.personality>

[last N user/assistant turns, up to assistant.history_length]
User: <previous user>
Assistant: <previous reply>

User: <current user text>
Assistant:

Run: vibe-speech --config config.yaml serve (use --dry-run to log without speaking).
Hold ctrl+shift+space (default) while speaking; release to transcribe, send to the assistant, and hear the reply. Spinner shows work in progress; logs include timing and raw/final text.

Virtualenv activation (if you created .venv above):

macOS/Linux (bash/zsh): source .venv/bin/activate
Windows PowerShell: .venv\\Scripts\\Activate.ps1
Windows cmd: .venv\\Scripts\\activate.bat

Debug logging

CLI flags: vibe-speech --log-level DEBUG --config config.yaml serve (or python -m vibe_speech.cli --log-level DEBUG --config config.yaml serve).
Config file: set log_level: DEBUG in config.yaml and run normally.

Platform notes

macOS: grant Accessibility for your terminal/editor so typing works; mic permission for the terminal. Tail padding helps avoid clipping; adjust in config.
Windows/Linux: relies on pyautogui for typing; focus targeting not yet implemented.

Troubleshooting

Mic errors (AUHAL -50, etc.): set audio.device_name to a valid input from sounddevice.query_devices(), and ensure mic permission is granted.
Model downloads: set whisper.offline: false for the first run to cache; then flip to true for offline use.
Accuracy vs speed: smaller models/beam=1–3 for speed; larger/beam=5 for accuracy.

Notes

Streaming/partials are not implemented; the app buffers until hotkey release (with optional tail capture).
The rewriter can change phrasing; set processing.mode: raw and rewriter.enabled: false for unaltered Whisper output.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
discussions		discussions
scripts		scripts
src/vibe_speech		src/vibe_speech
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
config.sample.yaml		config.sample.yaml
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vibe Speech (WIP)

Current status

Architecture (plain text)

Project layout

Quick start

Prompt shape (with history)

Debug logging

Platform notes

Troubleshooting

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vibe Speech (WIP)

Current status

Architecture (plain text)

Project layout

Quick start

Prompt shape (with history)

Debug logging

Platform notes

Troubleshooting

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages