StealThis .dev
Recomendaciones Voz e IA de Audio

Speech & Voice AI (TTS / STT)

Open models for text-to-speech, voice cloning, and transcription — from tiny CPU models to expressive cloning.

alternativas (7)

Kokoro-82M

Ideal para: Lightweight, high-quality TTS

A tiny 82M-parameter open TTS model with quality far above its size — great default for lightweight speech.

  • +Very small & fast
  • +Surprisingly good quality
  • +Permissive license
  • Fewer voices than big models

Pocket TTS

Ideal para: On-device / CPU TTS

Kyutai's ~100M TTS that runs several times faster than real-time on a CPU — fits in your pocket.

  • +Runs on CPU
  • +Faster than real-time
  • +Voice cloning
  • Small model limits

F5-TTS

Ideal para: Voice cloning

Fast, high-quality open TTS with strong zero-shot voice cloning.

  • +Great cloning
  • +Fast inference
  • Model license differs from code

VibeVoice

Ideal para: Long-form, multi-speaker

Microsoft's open frontier TTS for long-form, multi-speaker audio like podcasts and dialogue.

  • +Long-form audio
  • +Multiple speakers
  • +Microsoft-backed
  • Heavier model

Zonos

Ideal para: Expressive cloning

Zyphra's expressive open TTS trained on 200k+ hours, with high-fidelity 5-second voice cloning and emotion control.

  • +Very expressive
  • +Emotion control
  • +5s cloning
  • Larger to run

Higgs Audio

Ideal para: Multilingual TTS

Boson AI's text-audio foundation model with strong multilingual TTS (100+ languages).

  • +100+ languages
  • +Foundation model
  • Research / non-commercial license

whisper.cpp

Ideal para: On-device transcription

Fast, portable C/C++ port of OpenAI Whisper for on-device speech-to-text (transcription).

  • +Runs anywhere
  • +No GPU needed
  • +Many languages
  • STT only
  • not TTS

Comparar

Marca las que quieras comparar

AlternativaTaskBest forLicense
Kokoro-82MTTSLightweight TTSApache-2.0
Pocket TTSTTSCPU / on-deviceOpen
F5-TTSTTS + cloningCloning a voiceOpen
VibeVoiceTTS (long-form)Podcasts / dialogueMIT
ZonosTTS + cloningExpressive voicesApache-2.0
Higgs AudioTTS (foundation)Multilingual speechResearch / NC
whisper.cppSTT (transcribe)Local transcriptionMIT

Building voice features? For text-to-speech, Kokoro and Pocket TTS are tiny and run locally, F5-TTS and Zonos excel at voice cloning, VibeVoice handles long multi-speaker audio, and Higgs Audio covers many languages. For speech-to-text, whisper.cpp transcribes on-device. Compare by task, strength, and license (mind the non-commercial ones).