Voxtype vs VOXD
Two whisper.cpp-based tools with different philosophies: CLI daemon vs multi-UI application.
At a Glance
| Aspect | Voxtype | VOXD |
|---|---|---|
| Engine | Whisper (whisper.cpp) | Whisper (whisper.cpp) |
| Language | Rust | Python |
| Interface | CLI + systemd | CLI, GUI, Tray |
| AI Post-Processing | No | Yes (local/cloud LLM) |
| Voice Activity Detection | No (push-to-talk) | Yes (FLUX mode, beta) |
| Distro Packages | AUR, .deb, .rpm | AUR, .deb, .rpm, pipx |
| GPU Required | No (optional) | No |
| GPU Acceleration | Vulkan, CUDA, Metal, ROCm | No |
| Text Processing | Word replacements, spoken punctuation | AI Post-Processing (LLM) |
Critical Differences
Interface Philosophy
Voxtype is a headless CLI daemon. Configure it once, run it as a systemd service, interact via hotkey. No GUI, no tray icon needed.
VOXD offers multiple UI modes:
voxd --gui- PyQt6 graphical interfacevoxd --tray- System tray icon for unobstructed dictationvoxd --rh- Terminal/CLI hotkey modevoxd --flux- VAD-triggered continuous dictation
AI Post-Processing
Voxtype outputs exactly what Whisper transcribes. Clean, predictable, no surprises.
VOXD offers AI Post-Processing (AIPP) that can clean up transcriptions using a local or cloud LLM. This can fix grammar, add punctuation, or reformat text. Powerful but adds complexity and potential for unexpected changes.
Setup Experience
Voxtype setup is minimal:
paru -S voxtype
voxtype setup model
voxtype setup systemd
systemctl --user enable --now voxtype
VOXD has an interactive first-run setup:
# Install
paru -S voxd
# Or: sudo dpkg -i voxd_*.deb
# First run triggers setup wizard
voxd
# Configures: voice model, LLM model, ydotool
# For Wayland, also run:
./setup_ydotool.sh
# Then log out and back in
Feature Comparison
What VOXD Does Better
- GUI options - Tray icon and graphical settings for users who prefer visual interfaces
- AI post-processing - LLM can clean up transcriptions automatically
- FLUX mode - Voice-triggered continuous dictation (no hotkey needed)
- 99+ languages - Explicit multilingual focus
- Broader distro testing - Explicitly tested on Ubuntu, Fedora, Mint, Pop!_OS, openSUSE
What Voxtype Does Better
- Simplicity - One binary, one config file, one systemd service
- Predictable output - No AI surprises; what Whisper hears is what you get
- Resource efficiency - Rust binary vs Python + PyQt6
- GPU acceleration - Vulkan, CUDA, Metal, ROCm support for fast transcription
- Text processing - Word replacements and spoken punctuation built-in
- Waybar integration - Native status module for tiling WM users
- Customizable audio feedback - Multiple themes, custom sounds
Resource Usage
| Metric | Voxtype | VOXD |
|---|---|---|
| Idle Memory | ~50MB | ~100-150MB (GUI/tray) |
| Dependencies | Minimal (system libs) | Python, PyQt6, etc. |
| Binary Size | ~15MB | Python scripts |
The Verdict
Choose Voxtype if you prefer a minimal, headless daemon that runs via systemd. Ideal for tiling WM users who want Waybar integration and don't need a GUI.
Choose VOXD if you want a GUI or tray icon, need AI post-processing to clean up transcriptions, or prefer voice-triggered continuous dictation over push-to-talk.
Both are solid offline Whisper tools. VOXD offers more features and UI options; Voxtype offers simplicity and efficiency.