Wispr Flow alternative for hyprland
Find a file
github-actions[bot] 59e7f92f10
chore: release v0.3.8 (#60)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-07 00:18:51 -05:00
.github/workflows fix: stuff 2026-01-06 23:50:03 -05:00
.opencode feat: opencode config and actions 2025-12-07 12:07:07 -05:00
assets Add remote transcription providers (#10) 2025-10-17 23:17:42 -04:00
config fix: pipe walker history selection to wl-copy (#56) 2025-12-12 00:20:18 -05:00
scripts feat: service install and waybar module stuff (#47) 2025-12-04 17:07:39 -05:00
src fix: max speech secionds config serialization issue & add tests (#61) 2026-01-07 00:15:39 -05:00
tests fix: max speech secionds config serialization issue & add tests (#61) 2026-01-07 00:15:39 -05:00
.gitignore Feat/ci (#12) 2025-10-18 01:00:28 -04:00
AGENTS.md feat: add AGENTS.md 2025-10-12 21:53:35 -04:00
Cargo.lock chore: release v0.3.8 (#60) 2026-01-07 00:18:51 -05:00
Cargo.toml chore: release v0.3.8 (#60) 2026-01-07 00:18:51 -05:00
CHANGELOG.md chore: release v0.3.8 (#60) 2026-01-07 00:18:51 -05:00
git-cliff.toml feat: add release-plz, update docs & ci (#19) 2025-10-21 11:12:14 -04:00
LICENSE feat: add license 2025-10-12 18:58:14 -04:00
README.md fix: max speech secionds config serialization issue & add tests (#61) 2026-01-07 00:15:39 -05:00
release-plz.toml feat: add release-plz, update docs & ci (#19) 2025-10-21 11:12:14 -04:00
RELEASING.md feat: add release-plz, update docs & ci (#19) 2025-10-21 11:12:14 -04:00

hyprwhspr-rs logo

hyprwhspr-rs

Rust implementation of hyprwhspr | Blazing fast, native speech-to-text voice dictation for Hyprland and Omarchy | Waybar, Walker, and Elephant integrations (optional)


cargo install hyprwhspr-rs

https://github.com/user-attachments/assets/bbbaa1c3-1a7e-4165-ad3d-27b7465e201a

Requirements

  • whisper.cpp (GitHub, AUR)
  • Groq or Gemini API key (optional)
    • Groq with whisper is cheap (~$0.10 USD/month) and fast as hell. [Data Controls]
    • Comparatively, Gemini is very slow but offers better output formatting.
  • Parakeet TDT (optional) - NVIDIA's local ASR model via ONNX
    • Run ./scripts/download-parakeet-tdt.sh to download model files (~1.2GB)
    • Very fast, but not as accurate as whisper or Gemini

Features

  • Fast speech-to-text
  • Intuitive configuration
  • Optional fast VAD trims (fast_vad.enabled) audio files, reducing inferences costs while increasing output speed

Built for Hyprland

  • Detects Hyprland via HYPRLAND_INSTANCE_SIGNATURE and opens the IPC socket at $XDG_RUNTIME_DIR/hypr/<signature>/.socket.sock.
  • Execs dispatch sendshortcut commands against the active window to paste dictated text, inspecting activewindow to decide when Shift is required for a hardcoded list of programs.
  • Falls back to a Wayland virtual keyboard client or a simulated keypress paste if IPC communication fails.

Installation

From crates.io

  1. Install the latest release from crates.io
cargo install hyprwhspr-rs
  1. Install systemd service and Waybar module (optionally, with a WIP elephant/walker menu using --with-elephant flag)
# Interactive install
hyprwhspr-rs install

# Optionally, install specific components (systemd, waybar, elephant)
hyprwhspr-rs install {--all| --service | --waybar | --elephant} {--force | -f}

From source

  1. git clone https://github.com/better-slop/hyprwhispr-rs.git
  2. cd hyprwhspr-rs
  3. cargo build --release
  4. sudo cp target/release/hyprwhspr-rs /usr/local/bin/

Waybar Integration

./scripts/install-waybar.sh

Installs systemd service, Waybar module, and CSS styles. Shows mic status in your bar.

Development

  1. git clone https://github.com/better-slop/hyprwhispr-rs.git
  2. cd hyprwhspr-rs
  3. cargo build --release
  4. Run using:
    • pretty logs: RUST_LOG=debug ./target/release/hyprwhspr-rs
    • production release: ./target/release/hyprwhspr-rs
Example config

Configure in ~/.config/hyprwhspr-rs/config.jsonc

{
  "shortcuts": {
    "press": "SUPER+ALT+D",
    "hold": "SUPER+ALT+CTRL",
  },
  "word_overrides": {
    "under score": "_",
    "em dash": "—",
    "equal": "=",
    "at sign": "@",
    "pound": "#",
    "hashtag": "#",
    "hash tag": "#",
    "newline": "\n",
    "Omarkey": "Omarchy",
    "dot": ".",
    "Hyperland": "hyprland",
    "hyperland": "hyprland",
  },
  "audio_feedback": true, // Play start/stop sounds while recording
  "start_sound_volume": 0.1, // 0.1 - 1.0
  "stop_sound_volume": 0.1, // 0.1 - 1.0
  "start_sound_path": null, // Optional custom audio asset overrides
  "stop_sound_path": null, // Optional custom audio asset overrides
  "auto_copy_clipboard": true, // Automatically copy the final transcription to the clipboard
  "shift_paste": false, // Whether to force shift paste
  "global_paste_shortcut": false, // Enable the compositor-level paste shortcut (Omarchy's addition)
  "paste_hints": {
    "shift": [
      // Optional list of Hyprland window classes that should always paste with Ctrl+Shift+V
    ],
  },
  "audio_device": null, // Force a specific input device index (null uses system default)
  "fast_vad": {
    "enabled": false, // Enable Earshot fast VAD trimming
    "profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive (lowercase only, serde-enforced; default aggressive)
    "min_speech_ms": 120, // Minimum detected speech before keeping a segment
    "silence_timeout_ms": 500, // Drop silence longer than this (ms)
    "pre_roll_ms": 120, // Audio to keep before speech to avoid clipping words
    "post_roll_ms": 150, // Audio to keep after speech before trimming
    "volatility_window": 24, // Frames observed for adaptive aggressiveness (30 ms per frame, matches FRAME_MS in src/audio/vad.rs)
    "volatility_increase_threshold": 0.35, // Bump profile when toggles exceed this ratio
    "volatility_decrease_threshold": 0.12, // Relax profile when toggles stay below this ratio
  },
  "transcription": {
    "provider": "whisper_cpp", // whisper_cpp | groq | gemini | parakeet
    "request_timeout_secs": 45,
    "max_retries": 2,
    "whisper_cpp": {
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
      "model": "large-v3-turbo-q8_0", // Whisper model to use (must exist in specified directories)
      "threads": 4, // CPU threads dedicated to whisper.cpp
      "gpu_layers": 999, // Number of layers to keep on GPU (999 = auto/GPU preferred)
      "fallback_cli": false, // Fallback to whisper-cli (uses CPU)
      "no_speech_threshold": 0.6, // Whisper's "no speech" confidence gate
      "models_dirs": ["~/.config/hyprwhspr-rs/models"], // Directories to search for models
      "vad": {
        "enabled": false, // Toggle whisper-cli's native Silero VAD
        "model": "ggml-silero-v5.1.2.bin", // Path or filename for the ggml Silero VAD model
        // Probability threshold for deciding a frame is speech. Higher = fewer false positives, but may miss quiet speech.
        "threshold": 0.5,
        // Minimum contiguous speech duration (ms) to accept. Increase to ignore quick clicks/taps.
        "min_speech_ms": 250,
        // Minimum silence gap (ms) required to end a speech segment. Raise if mid-sentence pauses are being split.
        "min_silence_ms": 120,
        // Maximum speech duration (seconds) before forcing a cut. Use null (or omit) to leave unlimited.
        "max_speech_s": 15.0,
        // Extra padding (ms) added before/after detected speech so words aren't clipped.
        "speech_pad_ms": 80,
        // Overlap ratio between segments. Higher overlap helps smooth transitions at the cost of a little extra decode time.
        "samples_overlap": 0.1,
      },
    },
    "groq": {
      "model": "whisper-large-v3-turbo",
      "endpoint": "https://api.groq.com/openai/v1/audio/transcriptions",
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
    "gemini": {
      "model": "gemini-2.5-flash-preview-09-2025",
      "endpoint": "https://generativelanguage.googleapis.com/v1beta/models",
      "temperature": 0.0,
      "max_output_tokens": 1024,
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
    "parakeet": {
      "model_dir": "models/parakeet/parakeet-tdt-0.6b-v3-onnx", // Relative to $XDG_DATA_HOME/hyprwhspr-rs (or ~/.local/share/hyprwhspr-rs)
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
  },
}
Earshot VAD trimming (optional)

The default build ships with the earshot VoiceActivityDetector baked in. Toggle fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.

  • Operates on the 16 kHz PCM emitted by the capture layer and shares the trimmed buffer across all providers.
  • Drops silent stretches longer than the configured timeout while keeping configurable pre-roll and post-roll padding so word edges remain intact.
  • Adapts Earshots aggressiveness based on recent speech/silence volatility—fewer uploads when the room is noisy.
  • If an entire recording is silent, the app short-circuits the upload path instead of dispatching an empty request.

All other fields in the fast_vad block map directly to the trimmers behaviour, so you can tune aggressiveness without recompiling.

Release process

Runtime builds rely on a local whisper.cpp installation, so validate that dependency before shipping a tagged version.

  1. Use Conventional Commits fix: bumps patch, feat: bumps minor, and type!: indicates a breaking change (major bump).
  2. On every push to main, the release-plz workflow runs release-pr to open or refresh a release-plz-* pull request. Review the proposed version and changelog there.
  3. When the release PR looks good, merge it. The same workflow runs release-plz release, tagging (vX.Y.Z) and publishing the crate to crates.io if its a stable tag.
  4. The tag triggers the release workflow, which builds the Linux binary, uploads the tarball + checksum, and publishes the GitHub release.

Define CRATES_IO_TOKEN in the repository secrets with publish-only permissions so the workflow can push stable releases to crates.io.

To Do

  • Slop review/clean up
  • Ship waybar integration
  • Release on Cargo
  • Release on AUR
  • Add support for other operating systems/setups
  • Refine paste layer
  • Investigate formatting model