Voxtype vs waystt
Two Rust-based speech-to-text tools for Wayland. Daemon vs signal-driven architecture.
At a Glance
| Aspect | Voxtype | waystt |
|---|---|---|
| Engine | Whisper (whisper.cpp) | Whisper (cloud or local) |
| Language | Rust | Rust |
| Architecture | Systemd daemon | Signal-driven (starts/stops) |
| Default Mode | Offline (local) | Cloud (OpenAI API) |
| Hotkey Detection | Built-in (evdev) | External (WM keybinds) |
| Audio Feedback | Yes (customizable) | Yes (beeps) |
| Cursor Injection | Built-in (ydotool) | Via --pipe-to |
| GPU Acceleration | Vulkan, CUDA, Metal, ROCm | No |
| Text Processing | Word replacements, spoken punctuation | None |
Critical Differences
Daemon vs Signal-Driven
Voxtype runs as a persistent systemd service. It's always listening for your hotkey, always ready. Start it once at login, forget about it.
waystt is invoked on-demand via signals. Your window manager keybind triggers it, it records, transcribes, outputs, then exits. This is lighter but requires more WM configuration.
Cloud vs Local Default
Voxtype is offline-only by design. Your voice data never leaves your machine.
waystt defaults to OpenAI's cloud API, requiring an API key and internet connection. Local mode with whisper-rs is available but requires downloading models separately.
Hotkey Handling
Voxtype handles hotkeys internally via evdev. Configure your key in config.toml and it just works.
waystt expects your window manager to send signals. You need to configure keybinds in Hyprland/Sway/etc. that run commands like pkill -SIGUSR1 waystt.
Setup Comparison
Voxtype
# Install and run - handles everything
paru -S voxtype
voxtype setup model
voxtype setup systemd
systemctl --user enable --now voxtype
# That's it. Hold ScrollLock to dictate.
waystt (Local Mode)
# Install
paru -S waystt-bin
# Download model
mkdir -p ~/.local/share/waystt
# Download GGML model manually...
# Configure WM keybinds (Hyprland example)
# bind = $mod, D, exec, waystt --local --pipe-to "ydotool type --file -"
# Start waystt in background, then trigger via keybind
Feature Comparison
What waystt Does Better
- Minimal footprint - Not running when you're not using it
- Cloud option - OpenAI API for those who prefer cloud accuracy
- Google Speech support - Alternative cloud provider
- Pipe-friendly - Outputs to stdout, easy to integrate with other tools
What Voxtype Does Better
- Zero WM config - Hotkeys work out of the box, no keybind setup
- Always ready - Daemon is instant, no startup latency
- Privacy-first - No cloud option means no temptation to send voice data
- GPU acceleration - Vulkan, CUDA, Metal, ROCm support for faster transcription
- Text processing - Word replacements and spoken punctuation built-in
- Integrated typing - Text appears at cursor automatically
- Waybar module - Built-in status indicator
The Verdict
Choose Voxtype if you want a complete solution that handles hotkeys, transcription, and typing without additional WM configuration. Ideal for users who want "install and forget."
Choose waystt if you prefer a minimal, signal-driven tool that you invoke on-demand, or if you want cloud transcription options. Better for users who like composing tools via shell pipelines.