← All Comparisons

Voxtype vs Nerd-dictation

Two approaches to offline speech-to-text on Linux. Both work on Wayland. Which fits your workflow?

At a Glance

Aspect Voxtype Nerd-dictation
Engine Whisper (whisper.cpp) VOSK
Language Rust Python (single file)
Architecture Systemd daemon Foreground process
Wayland Native (evdev) Via ydotool
Text Output wtype (native Wayland) xdotool/ydotool
CJK/Unicode Output Yes No (ydotool limitation)
Recording Feedback Audio + Notifications None
GPU Acceleration Vulkan, CUDA, Metal, ROCm No
Text Processing Word replacements, spoken punctuation Python callbacks

Critical Differences

CJK/Multilingual Text Output

Voxtype uses wtype for text output, which properly handles Korean, Chinese, Japanese, and other Unicode characters. No daemon required.

Nerd-dictation uses ydotool which cannot output CJK characters. ydotool simulates physical key presses, but CJK characters don't map to keyboard keys. You'll get garbled output like 9 . instead of Korean text.

Daemon vs Foreground

Voxtype runs as a systemd user service. It starts automatically at login, runs invisibly, and is always ready.

Nerd-dictation must run in a terminal foreground. You need to keep a terminal window open with the process running. Close the terminal, lose dictation. You can work around this with tmux or custom systemd units, but it's manual setup.

Recording Feedback

Voxtype plays audio cues when recording starts and stops, plus optional desktop notifications. You know it's working without looking at the screen.

Nerd-dictation provides no feedback whatsoever. No sound, no visual indicator, nothing. You press the hotkey and hope it's recording. You find out if it worked when text appears (or doesn't).

Recognition Quality

Voxtype (Whisper)

Whisper provides exceptional accuracy across accents and speaking styles. It handles technical terminology, mixed-language phrases, punctuation and capitalization, and unusual names.

Typical accuracy: 95-99% depending on audio quality and model size.

Nerd-dictation (VOSK)

VOSK is remarkably lightweight but has lower raw accuracy. Output is all lowercase with no automatic punctuation. Works better with clear, deliberate speech.

Typical accuracy: 85-95% depending on clarity and vocabulary.

Setup Complexity

Voxtype

# Install
curl -LO https://github.com/peteonrails/voxtype/releases/download/v0.2.1/voxtype_0.2.1-1_amd64.deb
sudo dpkg -i voxtype_0.2.1-1_amd64.deb

# Interactive model selection and systemd setup
voxtype setup model
voxtype setup systemd

Time to first transcription: ~5 minutes

Nerd-dictation

# Install VOSK
pip install vosk

# Download model manually
mkdir -p ~/.config/vosk
cd ~/.config/vosk
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip

# Clone and run
git clone https://github.com/ideasman42/nerd-dictation
./nerd-dictation/nerd-dictation begin --vosk-model-dir ~/.config/vosk/vosk-model-en-us-0.22

Time to first transcription: ~15-30 minutes (including troubleshooting)

Resource Usage

Metric Voxtype Nerd-dictation
Idle ~50MB, 0% CPU Not running
Active High CPU for 1-3s ~200MB, moderate CPU
Model size 300MB - 3GB ~50MB per language

Customization

Voxtype

Configuration via ~/.config/voxtype/config.toml:

[hotkey]
key = "rightctrl"
mode = "toggle"  # or "push_to_talk"

[audio.feedback]
enabled = true
theme = "subtle"

[text]
# Say "period" to get ".", "open paren" for "(", etc.
spoken_punctuation = true

# Custom word replacements
[text.replacements]
hyperwhisper = "hyprwhspr"
javascript = "JavaScript"

Nerd-dictation

Python callbacks let you transform text with full programming logic:

def process_text(text):
    # Arbitrary Python transformations
    text = text.replace("period", ".")
    text = text.replace("new line", "\n")
    return text.capitalize()

The Verdict

Choose Voxtype if you want the best accuracy, GPU acceleration, built-in text processing (spoken punctuation, word replacements), and prefer tools that just work.

Choose Nerd-dictation if you need arbitrary Python transformations, prefer minimal footprint, or enjoy tinkering.

Voxtype now includes built-in text processing that covers most common use cases without needing Python code.

Links