← All Comparisons

Voxtype vs VoxInput

Different philosophies: embedded Whisper vs API-based transcription. Both target Linux power users.

At a Glance

Aspect Voxtype VoxInput
Engine Whisper (embedded) LocalAI/OpenAI API
Language Rust Go
Architecture Self-contained daemon API client + LocalAI server
Typing Backend ydotool dotool
Hotkey Detection Built-in (evdev) External (WM keybinds + signals)
Voice Activity Detection No (push-to-talk) Yes (realtime mode)
Setup Complexity Low High (Docker, LocalAI)
GPU Acceleration Vulkan, CUDA, Metal, ROCm Via LocalAI
Text Processing Word replacements, spoken punctuation None

Critical Differences

Embedded vs API Architecture

Voxtype embeds whisper.cpp directly. One binary, one process, no external services needed.

VoxInput is an API client that connects to an OpenAI-compatible endpoint. The recommended setup involves running LocalAI in Docker, which provides the transcription service. This is more complex but allows swapping transcription backends.

Setup Complexity

Voxtype setup:

paru -S voxtype
voxtype setup model
voxtype setup systemd
systemctl --user enable --now voxtype

VoxInput setup:

# Install LocalAI via Docker
docker run -d --name localai -p 8080:8080 localai/localai

# Install whisper model via LocalAI web UI
# Open http://localhost:8080, install whisper-1 and silero-vad-ggml

# Install dotool and configure udev rules
# Add user to input group

# Build VoxInput
git clone https://github.com/richiejp/VoxInput
cd VoxInput && go build -o voxinput

# Configure WM keybinds for record/write commands

Voice Activity Detection

Voxtype uses push-to-talk exclusively. You control when recording happens.

VoxInput offers a realtime VAD mode using silero-vad that can automatically detect when you're speaking. This enables hands-free continuous dictation (though the feature is noted as partial/beta).

Feature Comparison

What VoxInput Does Better

What Voxtype Does Better

The Verdict

Choose Voxtype if you want a simple, self-contained tool that works out of the box. No Docker, no API servers, just install and dictate.

Choose VoxInput if you're already running LocalAI infrastructure, want VAD-based continuous dictation, or need the flexibility to swap transcription backends. Be prepared for a more complex setup.

Links