Open SourceApplications Layer

Voicebox

ElevenLabs-quality voice tools, completely on-device — clone voices, generate speech across 23 languages, and dictate with Whisper, all without a subscription.

Type

Open Source (MIT)

Stack Layer

Applications

Language

TypeScript

Stars

24.9k+

What it is

Voicebox is a free, open-source, local-first AI voice studio that bundles voice cloning, text-to-speech across 23 languages with seven TTS engine options, and Whisper-powered speech-to-text into a single native desktop app. Built with Tauri rather than Electron, it runs natively on Apple Silicon, NVIDIA CUDA, and AMD ROCm, keeping all audio data on your machine with no cloud dependency. It includes a multi-track story editor for podcast production, post-processing audio effects, and an MCP server for AI agent integration. The privacy advantage is the headline: everything happens locally. No audio leaves your device, no subscription is required, and no usage limits apply. For developers, the MCP server makes it callable from agent workflows that need voice output without routing through a third-party API.

Use this when you need ElevenLabs-style voice capabilities but have privacy requirements, cost constraints, or offline needs — or when you want to integrate speech synthesis into an AI agent workflow via MCP without third-party API dependencies.

Get started

GitHub ↗

Source, releases for Mac/Windows/Linux, and MCP documentation.

ElevenLabs

The cloud-based standard — useful comparison point for voice quality.

Podcastfy

Programmatic podcast generation — can use Voicebox-compatible TTS engines.

Wondercraft Podcastfy

​Voicebox

Type

Stack Layer

Language

Stars

​What it is

​Get started

GitHub ↗

​Related tools

ElevenLabs

Podcastfy

Voicebox

What it is

Get started

Related tools