StealThis .dev
Recommendations Speech & Voice AI

Speech & Voice AI (TTS / STT)

Open models for text-to-speech, voice cloning, and transcription — from tiny CPU models to expressive cloning.

alternatives (7)

Kokoro-82M

Best for: Lightweight, high-quality TTS

A tiny 82M-parameter open TTS model with quality far above its size — great default for lightweight speech.

  • +Very small & fast
  • +Surprisingly good quality
  • +Permissive license
  • Fewer voices than big models

Pocket TTS

Best for: On-device / CPU TTS

Kyutai's ~100M TTS that runs several times faster than real-time on a CPU — fits in your pocket.

  • +Runs on CPU
  • +Faster than real-time
  • +Voice cloning
  • Small model limits

F5-TTS

Best for: Voice cloning

Fast, high-quality open TTS with strong zero-shot voice cloning.

  • +Great cloning
  • +Fast inference
  • Model license differs from code

VibeVoice

Best for: Long-form, multi-speaker

Microsoft's open frontier TTS for long-form, multi-speaker audio like podcasts and dialogue.

  • +Long-form audio
  • +Multiple speakers
  • +Microsoft-backed
  • Heavier model

Zonos

Best for: Expressive cloning

Zyphra's expressive open TTS trained on 200k+ hours, with high-fidelity 5-second voice cloning and emotion control.

  • +Very expressive
  • +Emotion control
  • +5s cloning
  • Larger to run

Higgs Audio

Best for: Multilingual TTS

Boson AI's text-audio foundation model with strong multilingual TTS (100+ languages).

  • +100+ languages
  • +Foundation model
  • Research / non-commercial license

whisper.cpp

Best for: On-device transcription

Fast, portable C/C++ port of OpenAI Whisper for on-device speech-to-text (transcription).

  • +Runs anywhere
  • +No GPU needed
  • +Many languages
  • STT only
  • not TTS

Compare

Tick the ones you want to compare

AlternativeTaskBest forLicense
Kokoro-82MTTSLightweight TTSApache-2.0
Pocket TTSTTSCPU / on-deviceOpen
F5-TTSTTS + cloningCloning a voiceOpen
VibeVoiceTTS (long-form)Podcasts / dialogueMIT
ZonosTTS + cloningExpressive voicesApache-2.0
Higgs AudioTTS (foundation)Multilingual speechResearch / NC
whisper.cppSTT (transcribe)Local transcriptionMIT

Building voice features? For text-to-speech, Kokoro and Pocket TTS are tiny and run locally, F5-TTS and Zonos excel at voice cloning, VibeVoice handles long multi-speaker audio, and Higgs Audio covers many languages. For speech-to-text, whisper.cpp transcribes on-device. Compare by task, strength, and license (mind the non-commercial ones).