Linux Speech-to-Text Tools

An honest comparison to help you find the right tool for your setup. We want you to succeed with voice input, even if Voxtype isn't the right fit.

Quick Comparison

Tool Engine Offline GPU Wayland CJK Output Any Desktop Setup
Voxtype Whisper Yes Vulkan/CUDA Native Yes (wtype) Yes Easy
Vocalinux Whisper Yes No Yes No (ydotool) Yes Medium
Nerd-dictation VOSK Yes No Via ydotool No (ydotool) Yes Medium
Speech Note Multi Yes CUDA only Yes Yes (clipboard) GUI app Easy
Blurt Whisper Yes No Yes Yes (clipboard) GNOME only Medium
WhisperWriter Whisper Yes CUDA only Partial Varies Yes Medium
Numen Voice VOSK Yes No Yes Varies Yes Hard
ibus-speech-to-text VOSK Yes No Varies Yes (IBus) IBus req. Hard
hyprwhspr Whisper Yes CUDA Native Yes (wl-copy) Arch/Hyprland Easy
waystt Whisper Optional No Native Varies Yes Medium
VoxInput LocalAI Via LocalAI Via LocalAI Yes Varies Yes Hard
VOXD Whisper Yes No Yes No (ydotool) Yes Medium

Which Tool Should You Use?

Korean, Chinese, Japanese

Need to dictate in CJK languages?

Voxtype on Wayland - one of the only tools that correctly outputs CJK characters. Uses wtype for native text injection. Most alternatives rely on ydotool which cannot type non-ASCII characters. (Note: CJK output requires Wayland; X11 falls back to ydotool.)

Hyprland / Sway Users

Want native compositor keybindings?

Voxtype - use bind/bindr or bindsym --release for push-to-talk. No input group required. Bind any key combo like Super+V.

Any Linux Desktop

Using GNOME, KDE, or other desktops?

Voxtype - works on Wayland and X11 with kernel-level hotkey detection. Optimized for Wayland.

GNOME Users

Running GNOME Shell?

Blurt - native GNOME extension. Note: clipboard-only (requires paste).

KDE Plasma Users

KDE has no built-in STT.

Voxtype - works perfectly on KDE Wayland with full features.

File Transcription

Need to transcribe recordings?

Speech Note - GUI app with multiple engines and file import.

Accessibility

Need hands-free computer control?

Numen Voice - designed for full voice control, not just dictation.

Hackers & Tinkerers

Want to customize everything?

Nerd-dictation - single Python file with powerful hooks.

Cross-Platform

Use Windows and Linux?

WhisperWriter - Python app that works on both platforms.

Tiling WM + Waybar

Want status bar integration?

Voxtype - built-in Waybar module shows recording state. Runs as systemd service.

Whisper vs VOSK

The two main offline speech recognition engines

Whisper (OpenAI)

Maximum accuracy

  • + Superior accuracy (95-99%)
  • + 99+ languages from one model
  • + Automatic punctuation & caps
  • + Handles accents beautifully
  • - Higher CPU usage
  • - Larger models (300MB-6GB)

Used by: Voxtype, Blurt, WhisperWriter, Speech Note

VOSK (Alpha Cephei)

Lightweight & fast

  • + Very lightweight (~50MB/language)
  • + Low CPU usage
  • + Real-time streaming
  • + Works on Raspberry Pi
  • - Lower accuracy (85-95%)
  • - All lowercase output

Used by: Nerd-dictation, Numen, ibus-speech-to-text, Speech Note

Detailed Comparisons

Deep dives into how Voxtype compares with each alternative

All These Tools Respect Your Privacy

Every tool compared here supports fully offline operation. Your voice never leaves your computer. No cloud accounts, no subscriptions, no data collection. This is how speech recognition should work.

Ready to Try Voxtype?

Hold a key. Speak. Release. Your words appear at the cursor.