Cross-platform, local-first voice helper that listens to the mic, runs Whisper locally, and feeds the transcript to a local LLM assistant. The assistant replies with your configured personality and the response is spoken aloud via xsst2. Designed for macOS and Windows, with a fallback automation layer that works anywhere pyautogui does.
- Push-to-talk hotkey with tail padding; buffers while held, then transcribes once and routes the transcript to an LLM assistant.
- Faster-Whisper backend (configurable model; defaults to
large-v3-turbounless you change it). Beam size, language, and an optional initial prompt are configurable. - Optional rewriter (Ollama or llama.cpp) for grammar polish before the assistant sees the text.
- Assistant replies are generated by a local/Ollama LLM with a customizable personality and spoken through
xsst2. - Assistant prompts can include a configurable window of recent conversation history (
assistant.history_length). - Spinner/colored timing logs so you can see when transcription/rewriting/assistant work is running.
Mic -> AudioCapture (chunk/tail) -> WhisperEngine (transcribe)
-> Processor (cleanup/correct + optional Rewriter)
-> Assistant (LLM; uses conversation history window)
-> SpeechSynthesizer (TTS) + Automation (typing)
src/vibe_speech/cli.py– entrypoint (vibe-speechscript) withserveanddoctor.src/vibe_speech/config.py– config models and loader.src/vibe_speech/runtime.py– audio capture, hotkey handling, buffering, transcription, output, and logging.src/vibe_speech/whisper_engine.py– Whisper wrapper (faster-whisper).src/vibe_speech/automation.py– text output automation (pyautogui).src/vibe_speech/processor.py– processing modes (raw, cleanup, optional correction/rewriter).src/vibe_speech/rewriter.py– optional grammar rewriter (Ollama/local llama.cpp).config.sample.yaml– defaults for local development.
- Python 3.11+,
ffmpegon PATH. - Install:
python -m venv .venv && source .venv/bin/activate && pip install -e . - Copy and edit config:
cp config.sample.yaml config.yaml- Set
audio.device_nameto your mic. - Choose a Whisper model (
whisper.model_size), beam size,initial_promptif desired. - Set
whisper.remote_urlto offload transcription to your remote/transcribeservice (leave empty to use local Whisper). - Set the assistant provider/model/personality/history length in
assistant.*; adjustspeech.*if yourxsst2path or args differ. - Enable/disable the optional rewriter as needed.
- Set
assistant.system_prompt
Personality: <assistant.personality>
[last N user/assistant turns, up to assistant.history_length]
User: <previous user>
Assistant: <previous reply>
User: <current user text>
Assistant:
- Run:
vibe-speech --config config.yaml serve(use--dry-runto log without speaking). - Hold
ctrl+shift+space(default) while speaking; release to transcribe, send to the assistant, and hear the reply. Spinner shows work in progress; logs include timing and raw/final text.
Virtualenv activation (if you created .venv above):
- macOS/Linux (bash/zsh):
source .venv/bin/activate - Windows PowerShell:
.venv\\Scripts\\Activate.ps1 - Windows cmd:
.venv\\Scripts\\activate.bat
- CLI flags:
vibe-speech --log-level DEBUG --config config.yaml serve(orpython -m vibe_speech.cli --log-level DEBUG --config config.yaml serve). - Config file: set
log_level: DEBUGinconfig.yamland run normally.
- macOS: grant Accessibility for your terminal/editor so typing works; mic permission for the terminal. Tail padding helps avoid clipping; adjust in config.
- Windows/Linux: relies on
pyautoguifor typing; focus targeting not yet implemented.
- Mic errors (AUHAL -50, etc.): set
audio.device_nameto a valid input fromsounddevice.query_devices(), and ensure mic permission is granted. - Model downloads: set
whisper.offline: falsefor the first run to cache; then flip totruefor offline use. - Accuracy vs speed: smaller models/beam=1–3 for speed; larger/beam=5 for accuracy.
- Streaming/partials are not implemented; the app buffers until hotkey release (with optional tail capture).
- The rewriter can change phrasing; set
processing.mode: rawandrewriter.enabled: falsefor unaltered Whisper output.