Skip to content

hackjutsu/vibe-speech

Repository files navigation

Vibe Speech (WIP)

Cross-platform, local-first voice helper that listens to the mic, runs Whisper locally, and feeds the transcript to a local LLM assistant. The assistant replies with your configured personality and the response is spoken aloud via xsst2. Designed for macOS and Windows, with a fallback automation layer that works anywhere pyautogui does.

Current status

  • Push-to-talk hotkey with tail padding; buffers while held, then transcribes once and routes the transcript to an LLM assistant.
  • Faster-Whisper backend (configurable model; defaults to large-v3-turbo unless you change it). Beam size, language, and an optional initial prompt are configurable.
  • Optional rewriter (Ollama or llama.cpp) for grammar polish before the assistant sees the text.
  • Assistant replies are generated by a local/Ollama LLM with a customizable personality and spoken through xsst2.
  • Assistant prompts can include a configurable window of recent conversation history (assistant.history_length).
  • Spinner/colored timing logs so you can see when transcription/rewriting/assistant work is running.

Architecture (plain text)

Mic -> AudioCapture (chunk/tail) -> WhisperEngine (transcribe)
    -> Processor (cleanup/correct + optional Rewriter)
    -> Assistant (LLM; uses conversation history window)
    -> SpeechSynthesizer (TTS) + Automation (typing)

Project layout

  • src/vibe_speech/cli.py – entrypoint (vibe-speech script) with serve and doctor.
  • src/vibe_speech/config.py – config models and loader.
  • src/vibe_speech/runtime.py – audio capture, hotkey handling, buffering, transcription, output, and logging.
  • src/vibe_speech/whisper_engine.py – Whisper wrapper (faster-whisper).
  • src/vibe_speech/automation.py – text output automation (pyautogui).
  • src/vibe_speech/processor.py – processing modes (raw, cleanup, optional correction/rewriter).
  • src/vibe_speech/rewriter.py – optional grammar rewriter (Ollama/local llama.cpp).
  • config.sample.yaml – defaults for local development.

Quick start

  1. Python 3.11+, ffmpeg on PATH.
  2. Install: python -m venv .venv && source .venv/bin/activate && pip install -e .
  3. Copy and edit config: cp config.sample.yaml config.yaml
    • Set audio.device_name to your mic.
    • Choose a Whisper model (whisper.model_size), beam size, initial_prompt if desired.
    • Set whisper.remote_url to offload transcription to your remote /transcribe service (leave empty to use local Whisper).
    • Set the assistant provider/model/personality/history length in assistant.*; adjust speech.* if your xsst2 path or args differ.
    • Enable/disable the optional rewriter as needed.

Prompt shape (with history)

assistant.system_prompt
Personality: <assistant.personality>

[last N user/assistant turns, up to assistant.history_length]
User: <previous user>
Assistant: <previous reply>

User: <current user text>
Assistant:
  1. Run: vibe-speech --config config.yaml serve (use --dry-run to log without speaking).
  2. Hold ctrl+shift+space (default) while speaking; release to transcribe, send to the assistant, and hear the reply. Spinner shows work in progress; logs include timing and raw/final text.

Virtualenv activation (if you created .venv above):

  • macOS/Linux (bash/zsh): source .venv/bin/activate
  • Windows PowerShell: .venv\\Scripts\\Activate.ps1
  • Windows cmd: .venv\\Scripts\\activate.bat

Debug logging

  • CLI flags: vibe-speech --log-level DEBUG --config config.yaml serve (or python -m vibe_speech.cli --log-level DEBUG --config config.yaml serve).
  • Config file: set log_level: DEBUG in config.yaml and run normally.

Platform notes

  • macOS: grant Accessibility for your terminal/editor so typing works; mic permission for the terminal. Tail padding helps avoid clipping; adjust in config.
  • Windows/Linux: relies on pyautogui for typing; focus targeting not yet implemented.

Troubleshooting

  • Mic errors (AUHAL -50, etc.): set audio.device_name to a valid input from sounddevice.query_devices(), and ensure mic permission is granted.
  • Model downloads: set whisper.offline: false for the first run to cache; then flip to true for offline use.
  • Accuracy vs speed: smaller models/beam=1–3 for speed; larger/beam=5 for accuracy.

Notes

  • Streaming/partials are not implemented; the app buffers until hotkey release (with optional tail capture).
  • The rewriter can change phrasing; set processing.mode: raw and rewriter.enabled: false for unaltered Whisper output.

About

Local near real-time voice AI assistant

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages