Sprawdź:
Vosk Speech Recognition Toolkit
Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish. More to come.
Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification.
Coqui TTS to rozwidlenie projektu Mozilla TTS, które oferuje wysokiej jakości modele mowy.
- High-performance Deep Learning models for Text2Speech tasks.
- Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
- Speaker Encoder to compute speaker embeddings efficiently.
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
- Fast and efficient model training.
- Detailed training logs on the terminal and Tensorboard.
- Support for Multi-speaker TTS.
- Efficient, flexible, lightweight but feature complete Trainer API.
- Released and ready-to-use models.
- Tools to curate Text2Speech datasets underdataset_analysis.
- Utilities to use and test your models.
- Modular (but not too much) code base enabling easy implementation of new ideas.
eSpeak NG
eSpeak NG to nowoczesna wersja klasycznego eSpeak. Jest to lekka i szybka biblioteka TTS, która działa offline.
sprawdź też
Generative CS
Generative AI library for .NET 8.0 with built-in OpenAI ChatGPT and Google Gemini API clients and support for C# function calling via reflection.