| .github/workflows | ||
| assets | ||
| config | ||
| docs/examples | ||
| scripts | ||
| src | ||
| tests | ||
| .gitignore | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CHANGELOG.md | ||
| CONTRIBUTING.md | ||
| git-cliff.toml | ||
| LICENSE | ||
| README.md | ||
| release-plz.toml | ||
| RELEASING.md | ||
| rust-toolchain.toml | ||
hyprwhspr-rs
Rust implementation of hyprwhspr | Blazing fast, native speech-to-text voice dictation for Hyprland and Omarchy | Waybar, Walker, and Elephant integrations (optional)
cargo install hyprwhspr-rs
https://github.com/user-attachments/assets/bbbaa1c3-1a7e-4165-ad3d-27b7465e201a
Requirements
- whisper.cpp (GitHub, AUR)
- Ensure
whisper-cliis available on yourPATH(or use the managed build locations).
- Ensure
- libudev + pkg-config (required for hotplug detection;
libudev-devon Debian/Ubuntu) - GNU-only binaries (no musl releases)
- Groq or Gemini API key (optional)
- Use
GROQ_API_KEYfor providergroq - Use
GEMINI_API_KEYfor providergemini - Groq with whisper is cheap (~$0.10 USD/month) and fast as hell. [Data Controls]
- Comparatively, Gemini is very slow but offers better output formatting.
- Use
- Parakeet TDT (optional) - NVIDIA's local ASR model via ONNX
- Run
./scripts/download-parakeet-tdt.shto download model files (~1.2GB) - Very fast, but not as accurate as whisper or Gemini
- Run
Features
- Fast speech-to-text
- Intuitive configuration
- word overrides (many are already baked in)
- multi provider support
- hot reloading during runtime
- Optional fast VAD trims (
fast_vad.enabled) audio files, reducing inferences costs while increasing output speed
Built for Hyprland
- Detects Hyprland via
HYPRLAND_INSTANCE_SIGNATUREand opens the IPC socket at$XDG_RUNTIME_DIR/hypr/<signature>/.socket.sock. - Execs
dispatch sendshortcutcommands against the active window to paste dictated text, inspectingactivewindowto decide whenShiftis required for a hardcoded list of programs. - Falls back to a Wayland virtual keyboard client or a simulated keypress paste if IPC communication fails.
- See the example docs for additional integration paths outside of Waybar and Walker/Elephant.
Installation
From crates.io
-
Install the latest release from crates.io
cargo install hyprwhspr-rsOmit
parakeetbackend:cargo install hyprwhspr-rs --no-default-features -
Install systemd service and Waybar module (optionally, with a WIP elephant/walker menu using
--with-elephantflag)# Interactive install hyprwhspr-rs install # Optionally, install specific components (systemd, waybar, elephant) hyprwhspr-rs install {--all| --service | --waybar | --elephant} {--force | -f}
Notes:
- The installer writes the systemd unit with an absolute
ExecStart=pointing at thehyprwhspr-rsbinary you ranhyprwhspr-rs installwith. If you copy the unit template manually, ensurehyprwhspr-rsis resolvable by systemd (PATH / drop-in override). - If audio start/stop sounds are missing in your packaging setup, you can point the app at an installed assets directory with
HYPRWHSPR_ASSETS_DIR=/path/to/assets.
Using Nix
You can install the hyprwhspr-rs package from nixpkgs.
With NixOS:
{
# required to listen for keyboard shortcuts
users.users.<username>.extraGroups = [ "input" ];
# have it auto start as a systemd unit with
services.hyprwhspr-rs.enable = true;
# or just add it to your systemPackages
environment.systemPackages = [ pkgs.hyprwhspr-rs ];
# optional: to enable cuda (for AMD do `rocmSupport` instead of `cudaSupport`)
# cuda is unfree so not in the default nixos build caches
# I highly recommend adding the cuda build cache to your nixconfig https://discourse.nixos.org/t/cuda-cache-for-nix-community/56038
services.hyprwhspr-rs = {
enable = true;
package = pkgs.hyprwhspr-rs.override {
# to optimize build time you can skip enabling cudaSupport for one of these two
# for whisper do whisper-cpp, for NVIDIA Parakeet do onnxruntime
whispercpp = pkgs.whisper-cpp.override { cudaSupport = true; };
onnxruntime = pkgs.onnxruntime.override { cudaSupport = true; };
};
};
# you can also enable cuda/rocm globally, but this will increase the build time for your entire system if you dont add the cuda build cache
nixpkgs.config.cudaSupport = true;
# if you use groq or gemini for transcription, you can autoload their keys with
services.hyprwhspr-rs = {
enable = true;
# put `GROQ_API_KEY=...` or `GEMINI_API_KEY=...` in the file you put at this path
environmentFile = "/path/to/hyprwhspr_secret_file";
};
}
From source
git clone https://github.com/better-slop/hyprwhispr-rs.gitcd hyprwhspr-rscargo build --releasesudo cp target/release/hyprwhspr-rs /usr/local/bin/
Waybar Integration
./scripts/install-waybar.sh
Configuration
Example config
Configure in ~/.config/hyprwhspr-rs/config.jsonc
{
"shortcuts": {
"press": "SUPER+ALT+D",
"hold": "SUPER+ALT+CTRL",
},
"word_overrides": {
"under score": "_",
"em dash": "—",
"equal": "=",
"at sign": "@",
"pound": "#",
"hashtag": "#",
"hash tag": "#",
"newline": "\n",
"Omarkey": "Omarchy",
"dot": ".",
"Hyperland": "hyprland",
"hyperland": "hyprland",
},
"audio_feedback": true, // Play start/stop sounds while recording
"start_sound_volume": 0.1, // 0.1 - 1.0
"stop_sound_volume": 0.1, // 0.1 - 1.0
"start_sound_path": null, // Optional custom audio asset overrides
"stop_sound_path": null, // Optional custom audio asset overrides
"auto_copy_clipboard": true, // Automatically copy the final transcription to the clipboard
"shift_paste": false, // Whether to force shift paste
"global_paste_shortcut": false, // Enable compositor-level paste; uses Hyprland sendshortcut with Shift+Insert for all pastes
"paste_hints": {
"shift": [
// List of window classes that will always paste with Ctrl+Shift+V
],
"shift_insert": [
// List of window classes that will always paste with Shift+Insert
],
},
"audio_device": null, // Force a specific input device index (null uses system default)
"fast_vad": {
"enabled": false, // Enable Earshot fast VAD trimming
"profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive (lowercase only, serde-enforced; default aggressive)
"min_speech_ms": 120, // Minimum detected speech before keeping a segment
"silence_timeout_ms": 500, // Drop silence longer than this (ms)
"pre_roll_ms": 120, // Audio to keep before speech to avoid clipping words
"post_roll_ms": 150, // Audio to keep after speech before trimming
"volatility_window": 24, // Frames observed for adaptive aggressiveness (30 ms per frame, matches FRAME_MS in src/audio/vad.rs)
"volatility_increase_threshold": 0.35, // Bump profile when toggles exceed this ratio
"volatility_decrease_threshold": 0.12, // Relax profile when toggles stay below this ratio
},
"transcription": {
"provider": "whisper_cpp", // whisper_cpp | groq | gemini | parakeet
"request_timeout_secs": 45,
"max_retries": 2,
"whisper_cpp": {
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
"model": "large-v3-turbo-q8_0", // Whisper model to use (must exist in specified directories)
"threads": 4, // CPU threads dedicated to whisper.cpp
"gpu_layers": 999, // Number of layers to keep on GPU (999 = auto/GPU preferred)
"fallback_cli": false, // Fallback to whisper-cli (uses CPU)
"no_speech_threshold": 0.6, // Whisper's "no speech" confidence gate
"models_dirs": ["~/.local/share/hyprwhspr-rs/models"], // Directories to search for models
"vad": {
"enabled": false, // Toggle whisper-cli's native Silero VAD
"model": "ggml-silero-v5.1.2.bin", // Path or filename for the ggml Silero VAD model
// Probability threshold for deciding a frame is speech. Higher = fewer false positives, but may miss quiet speech.
"threshold": 0.5,
// Minimum contiguous speech duration (ms) to accept. Increase to ignore quick clicks/taps.
"min_speech_ms": 250,
// Minimum silence gap (ms) required to end a speech segment. Raise if mid-sentence pauses are being split.
"min_silence_ms": 120,
// Maximum speech duration (seconds) before forcing a cut. Use null (or omit) to leave unlimited.
"max_speech_s": 15.0,
// Extra padding (ms) added before/after detected speech so words aren't clipped.
"speech_pad_ms": 80,
// Overlap ratio between segments. Higher overlap helps smooth transitions at the cost of a little extra decode time.
"samples_overlap": 0.1,
},
},
"groq": {
"model": "whisper-large-v3-turbo",
"endpoint": "https://api.groq.com/openai/v1/audio/transcriptions",
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
},
"gemini": {
"model": "gemini-2.5-flash-preview-09-2025",
"endpoint": "https://generativelanguage.googleapis.com/v1beta/models",
"temperature": 0.0,
"max_output_tokens": 1024,
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
},
"parakeet": {
"model_dir": "models/parakeet/parakeet-tdt-0.6b-v3-onnx", // Relative to $XDG_DATA_HOME/hyprwhspr-rs (or ~/.local/share/hyprwhspr-rs)
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
},
},
}
Environment Variables
Configuring providers and other overrides.
Use transcription.provider in ~/.config/hyprwhspr-rs/config.jsonc to pick the backend.
Provider API key environment variables
- groq provider requires:
GROQ_API_KEY - gemini provider requires:
GEMINI_API_KEY - whisper_cpp (whisper-cli) does not require an API key; the binary is discovered via
PATHand managed locations under$XDG_DATA_HOME/$HOME
Recommended setup (systemd user service)
hyprwhspr-rs install installs a user unit with:
EnvironmentFile=-%h/.config/hyprwhspr-rs/env
Otherwise, set the env vars in your shell.
Extra env vars that affect provider behavior
- For
whisper_cpp/whisper-clidiscovery, the app also consults:PATH(searches forwhisper-cli, optional fallback names)XDG_DATA_HOME/HOME(managed whisper.cpp locations)
- For asset overrides (start/stop sounds):
HYPRWHSPR_ASSETS_DIR - Resolution logic lives in:
src/config.rs(discover_whisper_binary_candidates,find_binaries_on_path,discover_assets_dir)
Earshot VAD trimming (recommended)
The default build ships with the impressive and lightweight earshot VoiceActivityDetector baked in. Toggle fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.
fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.Configuring fast_vad
"fast_vad": {
"enabled": false,
"profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive
"min_speech_ms": 120, // minimum speech chunk to keep
"silence_timeout_ms": 500, // silence length that ends a segment
"pre_roll_ms": 120, // speech-leading padding
"post_roll_ms": 150, // speech-trailing padding
"volatility_window": 24, // decision history window
"volatility_increase_threshold": 0.35, // become more aggressive above this
"volatility_decrease_threshold": 0.12 // relax aggressiveness below this
}
About earshot
- Works well for silence, not as accurate at speech compared to other models.
- Operates on the 16 kHz PCM emitted by the capture layer and shares the trimmed buffer across all providers.
- Drops silent stretches longer than the configured timeout while keeping configurable pre-roll and post-roll padding so word edges remain intact.
- Adapts Earshot’s aggressiveness based on recent speech/silence volatility—fewer uploads when the room is noisy.
- If an entire recording is silent, the app attempts to short-circuit the upload path instead of dispatching an empty request.
All other fields in the fast_vad block map directly to the trimmer’s behaviour, so you can tune aggressiveness without
recompiling.
Development
git clone https://github.com/better-slop/hyprwhispr-rs.gitcd hyprwhspr-rscargo build --release- Faster build (skips Parakeet backend):
cargo build --release --no-default-features
- Faster build (skips Parakeet backend):
- Run using:
- pretty logs:
RUST_LOG=debug ./target/release/hyprwhspr-rs - production release:
./target/release/hyprwhspr-rs
- pretty logs:
Release process
Runtime builds rely on a local whisper.cpp installation, so validate that dependency before shipping a tagged version.
whisper.cpp installation, so validate that dependency before shipping a tagged version.- Use Conventional Commits –
fix:bumps patch,feat:bumps minor, andtype!:indicates a breaking change (major bump). - On every push to
main, therelease-plzworkflow runsrelease-prto open or refresh arelease-plz-*pull request. Review the proposed version and changelog there. - When the release PR looks good, merge it. The same workflow runs
release-plz release, tagging (vX.Y.Z) and publishing the crate to crates.io if it’s a stable tag. - The tag triggers the
releaseworkflow, which builds the Linux GNU binary, uploads the tarball + checksum, and publishes the GitHub release with the full commit list (plus PR links when available).
Define
CARGO_REGISTRY_TOKENin the repository secrets with publish-only permissions so the workflow can push stable releases to crates.io.
To Do
- Slop review/clean up
- Ship waybar integration
- Release on Cargo
- Release on AUR
- Add support for other operating systems/setups
- Refine paste layer
- Investigate formatting model