Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |
||
|---|---|---|
| .github/workflows | ||
| .opencode | ||
| assets | ||
| config | ||
| scripts | ||
| src | ||
| tests | ||
| .gitignore | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CHANGELOG.md | ||
| git-cliff.toml | ||
| LICENSE | ||
| README.md | ||
| release-plz.toml | ||
| RELEASING.md | ||
hyprwhspr-rs
Rust implementation of hyprwhspr | Blazing fast, native speech-to-text voice dictation for Hyprland and Omarchy | Waybar, Walker, and Elephant integrations (optional)
cargo install hyprwhspr-rs
https://github.com/user-attachments/assets/bbbaa1c3-1a7e-4165-ad3d-27b7465e201a
Requirements
- whisper.cpp (GitHub, AUR)
- Groq or Gemini API key (optional)
- Groq with whisper is cheap (~$0.10 USD/month) and fast as hell. [Data Controls]
- Comparatively, Gemini is very slow but offers better output formatting.
- Parakeet TDT (optional) - NVIDIA's local ASR model via ONNX
- Run
./scripts/download-parakeet-tdt.shto download model files (~1.2GB) - Very fast, but not as accurate as whisper or Gemini
- Run
Features
- Fast speech-to-text
- Intuitive configuration
- word overrides (many are already baked in)
- multi provider support
- hot reloading during runtime
- Optional fast VAD trims (
fast_vad.enabled) audio files, reducing inferences costs while increasing output speed
Built for Hyprland
- Detects Hyprland via
HYPRLAND_INSTANCE_SIGNATUREand opens the IPC socket at$XDG_RUNTIME_DIR/hypr/<signature>/.socket.sock. - Execs
dispatch sendshortcutcommands against the active window to paste dictated text, inspectingactivewindowto decide whenShiftis required for a hardcoded list of programs. - Falls back to a Wayland virtual keyboard client or a simulated keypress paste if IPC communication fails.
Installation
From crates.io
- Install the latest release from crates.io
cargo install hyprwhspr-rs
- Install systemd service and Waybar module (optionally, with a WIP elephant/walker menu using
--with-elephantflag)
# Interactive install
hyprwhspr-rs install
# Optionally, install specific components (systemd, waybar, elephant)
hyprwhspr-rs install {--all| --service | --waybar | --elephant} {--force | -f}
From source
git clone https://github.com/better-slop/hyprwhispr-rs.gitcd hyprwhspr-rscargo build --releasesudo cp target/release/hyprwhspr-rs /usr/local/bin/
Waybar Integration
./scripts/install-waybar.sh
Installs systemd service, Waybar module, and CSS styles. Shows mic status in your bar.
Development
git clone https://github.com/better-slop/hyprwhispr-rs.gitcd hyprwhspr-rscargo build --release- Run using:
- pretty logs:
RUST_LOG=debug ./target/release/hyprwhspr-rs - production release:
./target/release/hyprwhspr-rs
- pretty logs:
Example config
Configure in ~/.config/hyprwhspr-rs/config.jsonc
{
"shortcuts": {
"press": "SUPER+ALT+D",
"hold": "SUPER+ALT+CTRL",
},
"word_overrides": {
"under score": "_",
"em dash": "—",
"equal": "=",
"at sign": "@",
"pound": "#",
"hashtag": "#",
"hash tag": "#",
"newline": "\n",
"Omarkey": "Omarchy",
"dot": ".",
"Hyperland": "hyprland",
"hyperland": "hyprland",
},
"audio_feedback": true, // Play start/stop sounds while recording
"start_sound_volume": 0.1, // 0.1 - 1.0
"stop_sound_volume": 0.1, // 0.1 - 1.0
"start_sound_path": null, // Optional custom audio asset overrides
"stop_sound_path": null, // Optional custom audio asset overrides
"auto_copy_clipboard": true, // Automatically copy the final transcription to the clipboard
"shift_paste": false, // Whether to force shift paste
"global_paste_shortcut": false, // Enable the compositor-level paste shortcut (Omarchy's addition)
"paste_hints": {
"shift": [
// Optional list of Hyprland window classes that should always paste with Ctrl+Shift+V
],
},
"audio_device": null, // Force a specific input device index (null uses system default)
"fast_vad": {
"enabled": false, // Enable Earshot fast VAD trimming
"profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive (lowercase only, serde-enforced; default aggressive)
"min_speech_ms": 120, // Minimum detected speech before keeping a segment
"silence_timeout_ms": 500, // Drop silence longer than this (ms)
"pre_roll_ms": 120, // Audio to keep before speech to avoid clipping words
"post_roll_ms": 150, // Audio to keep after speech before trimming
"volatility_window": 24, // Frames observed for adaptive aggressiveness (30 ms per frame, matches FRAME_MS in src/audio/vad.rs)
"volatility_increase_threshold": 0.35, // Bump profile when toggles exceed this ratio
"volatility_decrease_threshold": 0.12, // Relax profile when toggles stay below this ratio
},
"transcription": {
"provider": "whisper_cpp", // whisper_cpp | groq | gemini | parakeet
"request_timeout_secs": 45,
"max_retries": 2,
"whisper_cpp": {
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
"model": "large-v3-turbo-q8_0", // Whisper model to use (must exist in specified directories)
"threads": 4, // CPU threads dedicated to whisper.cpp
"gpu_layers": 999, // Number of layers to keep on GPU (999 = auto/GPU preferred)
"fallback_cli": false, // Fallback to whisper-cli (uses CPU)
"no_speech_threshold": 0.6, // Whisper's "no speech" confidence gate
"models_dirs": ["~/.config/hyprwhspr-rs/models"], // Directories to search for models
"vad": {
"enabled": false, // Toggle whisper-cli's native Silero VAD
"model": "ggml-silero-v5.1.2.bin", // Path or filename for the ggml Silero VAD model
// Probability threshold for deciding a frame is speech. Higher = fewer false positives, but may miss quiet speech.
"threshold": 0.5,
// Minimum contiguous speech duration (ms) to accept. Increase to ignore quick clicks/taps.
"min_speech_ms": 250,
// Minimum silence gap (ms) required to end a speech segment. Raise if mid-sentence pauses are being split.
"min_silence_ms": 120,
// Maximum speech duration (seconds) before forcing a cut. Use null (or omit) to leave unlimited.
"max_speech_s": 15.0,
// Extra padding (ms) added before/after detected speech so words aren't clipped.
"speech_pad_ms": 80,
// Overlap ratio between segments. Higher overlap helps smooth transitions at the cost of a little extra decode time.
"samples_overlap": 0.1,
},
},
"groq": {
"model": "whisper-large-v3-turbo",
"endpoint": "https://api.groq.com/openai/v1/audio/transcriptions",
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
},
"gemini": {
"model": "gemini-2.5-flash-preview-09-2025",
"endpoint": "https://generativelanguage.googleapis.com/v1beta/models",
"temperature": 0.0,
"max_output_tokens": 1024,
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
},
"parakeet": {
"model_dir": "models/parakeet/parakeet-tdt-0.6b-v3-onnx", // Relative to $XDG_DATA_HOME/hyprwhspr-rs (or ~/.local/share/hyprwhspr-rs)
"prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
},
},
}
Earshot VAD trimming (optional)
The default build ships with the earshot VoiceActivityDetector baked in. Toggle fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.
fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.- Operates on the 16 kHz PCM emitted by the capture layer and shares the trimmed buffer across all providers.
- Drops silent stretches longer than the configured timeout while keeping configurable pre-roll and post-roll padding so word edges remain intact.
- Adapts Earshot’s aggressiveness based on recent speech/silence volatility—fewer uploads when the room is noisy.
- If an entire recording is silent, the app short-circuits the upload path instead of dispatching an empty request.
All other fields in the fast_vad block map directly to the trimmer’s behaviour, so you can tune aggressiveness without
recompiling.
Release process
Runtime builds rely on a local whisper.cpp installation, so validate that dependency before shipping a tagged version.
whisper.cpp installation, so validate that dependency before shipping a tagged version.- Use Conventional Commits –
fix:bumps patch,feat:bumps minor, andtype!:indicates a breaking change (major bump). - On every push to
main, therelease-plzworkflow runsrelease-prto open or refresh arelease-plz-*pull request. Review the proposed version and changelog there. - When the release PR looks good, merge it. The same workflow runs
release-plz release, tagging (vX.Y.Z) and publishing the crate to crates.io if it’s a stable tag. - The tag triggers the
releaseworkflow, which builds the Linux binary, uploads the tarball + checksum, and publishes the GitHub release.
Define
CRATES_IO_TOKENin the repository secrets with publish-only permissions so the workflow can push stable releases to crates.io.
To Do
- Slop review/clean up
- Ship waybar integration
- Release on Cargo
- Release on AUR
- Add support for other operating systems/setups
- Refine paste layer
- Investigate formatting model