hyprwhspr-rs

mirror of https://github.com/better-slop/hyprwhspr-rs.git synced 2026-05-04 02:10:46 +00:00

Wispr Flow alternative for hyprland

hyprland rust whisper

Rust 91.7%
Shell 7.7%
Lua 0.5%
CSS 0.1%

Find a file

zero 8242d806f5 chore: release v0.3.26 (#126 )		2026-04-25 14:25:08 -04:00
.github/workflows	feat: update actions (#96 )	2026-02-10 15:43:43 -05:00
assets	fix: input device add/remove failing (#62 )	2026-01-21 15:08:23 -05:00
config	refac: paths (#93 )	2026-02-10 15:06:32 -05:00
docs/examples	feat: add recorder IPC commands and Hyprland capture-first binds (#119 )	2026-04-16 15:40:10 -04:00
scripts	refac: paths (#93 )	2026-02-10 15:06:32 -05:00
src	feat: add recorder IPC commands and Hyprland capture-first binds (#119 )	2026-04-16 15:40:10 -04:00
tests	fix(config,injector): add shift_insert object to paste_hints config (#109 )	2026-03-04 03:47:57 -05:00
.gitignore	refac: paths (#93 )	2026-02-10 15:06:32 -05:00
AGENTS.md	feat: add AGENTS.md	2025-10-12 21:53:35 -04:00
Cargo.lock	chore: release v0.3.26 (#126 )	2026-04-25 14:25:08 -04:00
Cargo.toml	chore: release v0.3.26 (#126 )	2026-04-25 14:25:08 -04:00
CHANGELOG.md	chore: release v0.3.26 (#126 )	2026-04-25 14:25:08 -04:00
CONTRIBUTING.md	chore(docs): update integration examples & adds contributing.md (#90 )	2026-02-03 18:47:42 -05:00
git-cliff.toml	feat: add release-plz, update docs & ci (#19 )	2025-10-21 11:12:14 -04:00
LICENSE	feat: add license	2025-10-12 18:58:14 -04:00
README.md	feat: add recorder IPC commands and Hyprland capture-first binds (#119 )	2026-04-16 15:40:10 -04:00
release-plz.toml	feat: add release-plz, update docs & ci (#19 )	2025-10-21 11:12:14 -04:00
RELEASING.md	ci: drop musl release	2026-01-22 02:20:59 -05:00
rust-toolchain.toml	feat: update actions (#96 )	2026-02-10 15:43:43 -05:00

README.md

hyprwhspr-rs

Rust implementation of hyprwhspr | Blazing fast, native speech-to-text voice dictation for Hyprland and Omarchy | Waybar, Walker, and Elephant integrations (optional)

cargo install hyprwhspr-rs

GitHub Release

https://github.com/user-attachments/assets/bbbaa1c3-1a7e-4165-ad3d-27b7465e201a

Requirements

whisper.cpp (GitHub, AUR)
- Ensure whisper-cli is available on your PATH (or use the managed build locations).
libudev + pkg-config (required for hotplug detection; libudev-dev on Debian/Ubuntu)
GNU-only binaries (no musl releases)
Groq or Gemini API key (optional)
- Use GROQ_API_KEY for provider groq
- Use GEMINI_API_KEY for provider gemini
- Groq with whisper is cheap (~$0.10 USD/month) and fast as hell. [Data Controls]
- Comparatively, Gemini is very slow but offers better output formatting.
Parakeet TDT (optional) - NVIDIA's local ASR model via ONNX
- Run ./scripts/download-parakeet-tdt.sh to download model files (~1.2GB)
- Very fast, but not as accurate as whisper or Gemini

Features

Fast speech-to-text
Intuitive configuration
- word overrides (many are already baked in)
- multi provider support
- hot reloading during runtime
Optional fast VAD trims (fast_vad.enabled) audio files, reducing inferences costs while increasing output speed

Built for Hyprland

Detects Hyprland via HYPRLAND_INSTANCE_SIGNATURE and opens the IPC socket at $XDG_RUNTIME_DIR/hypr/<signature>/.socket.sock.
Execs dispatch sendshortcut commands against the active window to paste dictated text, inspecting activewindow to decide when Shift is required for a hardcoded list of programs.
Falls back to a Wayland virtual keyboard client or a simulated keypress paste if IPC communication fails.
Supports daemon control commands via hyprwhspr-rs record {start|stop|toggle|status} so Hyprland can own shortcut capture with bind / bindr.
See the example docs for additional integration paths outside of Waybar and Walker/Elephant.

Hyprland capture-first binds

The Linux Kernel treats input grabbing as an exclusive operation. If hyprwhspr-rs were to grab a keyboard device directly with EVIOCGRAB, it would become the sole recipient of that device's events until the grab was released. That is the wrong layer for push-to-talk dictation because it would require re-injecting every non-shortcut keypress through a virtual keyboard just to preserve normal typing.

For Hyprland, the cleaner approach is to let the compositor own shortcut capture and have hyprwhspr-rs expose recorder controls over a local socket. In practice, that means keeping the daemon running in the background and binding record start / record stop / record toggle directly in Hyprland:

bind = ALT, grave, exec, hyprwhspr-rs record start
bindr = ALT, grave, exec, hyprwhspr-rs record stop
bind = ALT, SPACE, exec, hyprwhspr-rs record toggle

Installation

From crates.io

Install the latest release from crates.io

cargo install hyprwhspr-rs

Omit parakeet backend:

cargo install hyprwhspr-rs --no-default-features

Install systemd service and Waybar module (optionally, with a WIP elephant/walker menu using --with-elephant flag)

# Interactive install
hyprwhspr-rs install

# Optionally, install specific components (systemd, waybar, elephant)
hyprwhspr-rs install {--all| --service | --waybar | --elephant} {--force | -f}

Notes:

The installer writes the systemd unit with an absolute ExecStart= pointing at the hyprwhspr-rs binary you ran hyprwhspr-rs install with. If you copy the unit template manually, ensure hyprwhspr-rs is resolvable by systemd (PATH / drop-in override).
If audio start/stop sounds are missing in your packaging setup, you can point the app at an installed assets directory with HYPRWHSPR_ASSETS_DIR=/path/to/assets.

Using Nix

You can install the hyprwhspr-rs package from nixpkgs.

With NixOS:

{
  # required to listen for keyboard shortcuts
  users.users.<username>.extraGroups = [ "input" ];
  
  # have it auto start as a systemd unit with
  services.hyprwhspr-rs.enable = true; 
  # or just add it to your systemPackages
  environment.systemPackages = [ pkgs.hyprwhspr-rs ];
  
  # optional: to enable cuda (for AMD do `rocmSupport` instead of `cudaSupport`)
  # cuda is unfree so not in the default nixos build caches
  # I highly recommend adding the cuda build cache to your nixconfig https://discourse.nixos.org/t/cuda-cache-for-nix-community/56038
  services.hyprwhspr-rs = {
    enable = true;
    package = pkgs.hyprwhspr-rs.override {
      # to optimize build time you can skip enabling cudaSupport for one of these two
      # for whisper do whisper-cpp, for NVIDIA Parakeet do onnxruntime
      whispercpp = pkgs.whisper-cpp.override { cudaSupport = true; }; 
      onnxruntime = pkgs.onnxruntime.override { cudaSupport = true; };
    };
  };
  # you can also enable cuda/rocm globally, but this will increase the build time for your entire system if you dont add the cuda build cache
  nixpkgs.config.cudaSupport = true;
  
  # if you use groq or gemini for transcription, you can autoload their keys with
  services.hyprwhspr-rs = {
    enable = true;
    # put `GROQ_API_KEY=...` or `GEMINI_API_KEY=...` in the file you put at this path
    environmentFile = "/path/to/hyprwhspr_secret_file";
  };
}

From source

git clone https://github.com/better-slop/hyprwhispr-rs.git
cd hyprwhspr-rs
cargo build --release
sudo cp target/release/hyprwhspr-rs /usr/local/bin/

Waybar Integration

./scripts/install-waybar.sh

Configuration

Example hyprland bindings config

Configure in, e.g., ~/.config/hypr/hyprland.conf

# hold
bind = ALT, GRAVE, exec, hyprwhspr-rs record start
bindr = ALT, GRAVE, exec, hyprwhspr-rs record stop

# tap
bind = ALT, SPACE, exec, hyprwhspr-rs record toggle

Example hyprwhspr-rs config

Configure in ~/.config/hyprwhspr-rs/config.jsonc

{
  "shortcuts": {
    "press": "SUPER+ALT+D",
    "hold": "SUPER+ALT+CTRL",
  },
  "word_overrides": {
    "under score": "_",
    "em dash": "—",
    "equal": "=",
    "at sign": "@",
    "pound": "#",
    "hashtag": "#",
    "hash tag": "#",
    "newline": "\n",
    "Omarkey": "Omarchy",
    "dot": ".",
    "Hyperland": "hyprland",
    "hyperland": "hyprland",
  },
  "audio_feedback": true, // Play start/stop sounds while recording
  "start_sound_volume": 0.1, // 0.1 - 1.0
  "stop_sound_volume": 0.1, // 0.1 - 1.0
  "start_sound_path": null, // Optional custom audio asset overrides
  "stop_sound_path": null, // Optional custom audio asset overrides
  "auto_copy_clipboard": true, // Automatically copy the final transcription to the clipboard
  "shift_paste": false, // Whether to force shift paste
  "global_paste_shortcut": false, // Enable compositor-level paste; uses Hyprland sendshortcut with Shift+Insert for all pastes
  "paste_hints": {
    "shift": [
      // List of window classes that will always paste with Ctrl+Shift+V
    ],
    "shift_insert": [
      // List of window classes that will always paste with Shift+Insert
    ],
  },
  "audio_device": null, // Force a specific input device index (null uses system default)
  "fast_vad": {
    "enabled": false, // Enable Earshot fast VAD trimming
    "profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive (lowercase only, serde-enforced; default aggressive)
    "min_speech_ms": 120, // Minimum detected speech before keeping a segment
    "silence_timeout_ms": 500, // Drop silence longer than this (ms)
    "pre_roll_ms": 120, // Audio to keep before speech to avoid clipping words
    "post_roll_ms": 150, // Audio to keep after speech before trimming
    "volatility_window": 24, // Frames observed for adaptive aggressiveness (30 ms per frame, matches FRAME_MS in src/audio/vad.rs)
    "volatility_increase_threshold": 0.35, // Bump profile when toggles exceed this ratio
    "volatility_decrease_threshold": 0.12, // Relax profile when toggles stay below this ratio
  },
  "transcription": {
    "provider": "whisper_cpp", // whisper_cpp | groq | gemini | parakeet
    "request_timeout_secs": 45,
    "max_retries": 2,
    "whisper_cpp": {
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
      "model": "large-v3-turbo-q8_0", // Whisper model to use (must exist in specified directories)
      "threads": 4, // CPU threads dedicated to whisper.cpp
      "gpu_layers": 999, // Number of layers to keep on GPU (999 = auto/GPU preferred)
      "fallback_cli": false, // Fallback to whisper-cli (uses CPU)
      "no_speech_threshold": 0.6, // Whisper's "no speech" confidence gate
      "models_dirs": ["~/.local/share/hyprwhspr-rs/models"], // Directories to search for models
      "vad": {
        "enabled": false, // Toggle whisper-cli's native Silero VAD
        "model": "ggml-silero-v5.1.2.bin", // Path or filename for the ggml Silero VAD model
        // Probability threshold for deciding a frame is speech. Higher = fewer false positives, but may miss quiet speech.
        "threshold": 0.5,
        // Minimum contiguous speech duration (ms) to accept. Increase to ignore quick clicks/taps.
        "min_speech_ms": 250,
        // Minimum silence gap (ms) required to end a speech segment. Raise if mid-sentence pauses are being split.
        "min_silence_ms": 120,
        // Maximum speech duration (seconds) before forcing a cut. Use null (or omit) to leave unlimited.
        "max_speech_s": 15.0,
        // Extra padding (ms) added before/after detected speech so words aren't clipped.
        "speech_pad_ms": 80,
        // Overlap ratio between segments. Higher overlap helps smooth transitions at the cost of a little extra decode time.
        "samples_overlap": 0.1,
      },
    },
    "groq": {
      "model": "whisper-large-v3-turbo",
      "endpoint": "https://api.groq.com/openai/v1/audio/transcriptions",
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
    "gemini": {
      "model": "gemini-2.5-flash-preview-09-2025",
      "endpoint": "https://generativelanguage.googleapis.com/v1beta/models",
      "temperature": 0.0,
      "max_output_tokens": 1024,
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
    "parakeet": {
      "model_dir": "models/parakeet/parakeet-tdt-0.6b-v3-onnx", // Relative to $XDG_DATA_HOME/hyprwhspr-rs (or ~/.local/share/hyprwhspr-rs)
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
  },
}

Environment Variables

Configuring providers and other overrides.

Use transcription.provider in ~/.config/hyprwhspr-rs/config.jsonc to pick the backend.

Provider API key environment variables

groq provider requires: GROQ_API_KEY
gemini provider requires: GEMINI_API_KEY
whisper_cpp (whisper-cli) does not require an API key; the binary is discovered via PATH and managed locations under $XDG_DATA_HOME / $HOME

Recommended setup (systemd user service)

hyprwhspr-rs install installs a user unit with:

EnvironmentFile=-%h/.config/hyprwhspr-rs/env

Otherwise, set the env vars in your shell.

Extra env vars that affect provider behavior

For whisper_cpp / whisper-cli discovery, the app also consults:
- PATH (searches for whisper-cli, optional fallback names)
- XDG_DATA_HOME / HOME (managed whisper.cpp locations)
For asset overrides (start/stop sounds): HYPRWHSPR_ASSETS_DIR
Resolution logic lives in:
- src/config.rs (discover_whisper_binary_candidates, find_binaries_on_path, discover_assets_dir)

Earshot VAD trimming (recommended)

The default build ships with the impressive and lightweight earshot VoiceActivityDetector baked in. Toggle fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.

Configuring `fast_vad`

"fast_vad": {
  "enabled": false,
  "profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive
  "min_speech_ms": 120, // minimum speech chunk to keep
  "silence_timeout_ms": 500, // silence length that ends a segment
  "pre_roll_ms": 120, // speech-leading padding
  "post_roll_ms": 150, // speech-trailing padding
  "volatility_window": 24, // decision history window
  "volatility_increase_threshold": 0.35, // become more aggressive above this
  "volatility_decrease_threshold": 0.12 // relax aggressiveness below this
}

About `earshot`

Works well for silence, not as accurate at speech compared to other models.
Operates on the 16 kHz PCM emitted by the capture layer and shares the trimmed buffer across all providers.
Drops silent stretches longer than the configured timeout while keeping configurable pre-roll and post-roll padding so word edges remain intact.
Adapts Earshot’s aggressiveness based on recent speech/silence volatility—fewer uploads when the room is noisy.
If an entire recording is silent, the app attempts to short-circuit the upload path instead of dispatching an empty request.

All other fields in the fast_vad block map directly to the trimmer’s behaviour, so you can tune aggressiveness without recompiling.

Development

git clone https://github.com/better-slop/hyprwhispr-rs.git
cd hyprwhspr-rs
cargo build --release
- Faster build (skips Parakeet backend): cargo build --release --no-default-features
Run using:
- pretty logs: RUST_LOG=debug ./target/release/hyprwhspr-rs
- production release: ./target/release/hyprwhspr-rs

Release process

Runtime builds rely on a local whisper.cpp installation, so validate that dependency before shipping a tagged version.

Use Conventional Commits – fix: bumps patch, feat: bumps minor, and type!: indicates a breaking change (major bump).
On every push to main, the release-plz workflow runs release-pr to open or refresh a release-plz-* pull request. Review the proposed version and changelog there.
When the release PR looks good, merge it. The same workflow runs release-plz release, tagging (vX.Y.Z) and publishing the crate to crates.io if it’s a stable tag.
The tag triggers the release workflow, which builds the Linux GNU binary, uploads the tarball + checksum, and publishes the GitHub release with the full commit list (plus PR links when available).

Define CARGO_REGISTRY_TOKEN in the repository secrets with publish-only permissions so the workflow can push stable releases to crates.io.

To Do

Slop review/clean up
Ship waybar integration
Release on Cargo
Release on AUR
Add support for other operating systems/setups
Refine paste layer
Investigate formatting model

README.md Unescape Escape

hyprwhspr-rs

Requirements

Features

Built for Hyprland

Hyprland capture-first binds

Installation

From crates.io

Using Nix

From source

Waybar Integration

Configuration

Provider API key environment variables

Recommended setup (systemd user service)

Extra env vars that affect provider behavior

Configuring fast_vad

About earshot

Development

To Do

README.md

Configuring `fast_vad`

About `earshot`