voice-server
Welcome to the voice-server documentation.
Overview
voice-server is a local-first Text-to-Speech (TTS) service built with Bun. It supports multiple backends including MLX-audio (Kokoro-82M) for fast local TTS on Apple Silicon, and Qwen TTS for custom voice cloning.
Features
- 🎙️ Local TTS - All audio generation happens on your machine
- 💰 Cost-Free - No per-character or per-minute charges
- 🔒 Private - No data sent to external services
- 🔊 41 Built-in Voices - Numeric voice IDs for easy configuration
- ⚡ Fast Streaming - Smooth real-time audio playback (RTF ~1.0x)
- 🌍 Multi-language - English, British, Japanese, Chinese voices
- 📱 macOS Integration - Native notifications and audio playback
Quick Installation
Prerequisites
# Install Bun (TypeScript runtime)
curl -fsSL https://bun.sh/install | bash
# Install uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install ffmpeg (for audio conversion)
brew install ffmpeg
Install and Run
# Clone and navigate to project
cd voice-server
# Install dependencies
bun install
uv tool install mlx-audio
# Run the server
TTS_BACKEND=mlx PORT=8888 bun run dev
# Test
curl http://localhost:8888/health
For complete installation instructions, API documentation, and configuration options, see the README.
Documentation
- README - Main documentation with installation, API reference, and configuration
- DEVELOPMENT.md - Development setup and configuration
- VOICE_GUIDE.md - User voice configuration guide
- VOICE_QUICK_REF.md - Quick reference for all 41 voices
- KOKORO_VOICES.md - Technical voice documentation
- MIGRATION.md - ElevenLabs migration guide