Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
***27/1/2023*** Script and video updated: https://www.youtube.com/watch?v=58IqrrXMxQo
A followup to the VITS video from a few weeks ago. Here you can fine-tune a multispeaker YourTTS model using your own voice samples. The samples are split, converted, run through rnnoise to denoise, transcribed with OpenAI Whisper STT, then put into a VCTK-format dataset, and used to fine tune the YourTTS model using Coqui TTS.
Notebook:
https://colab.research.google.com/drive/16Z2AeeGC4xAZlLWCCCQGfeWGlF_5Bj2E?usp=sharing
Python script:
https://pastebin.com/iRe3wjSL
Generate text with the CLI:
tts --text "text" --out_path outfile.wav --model_path multivoice/traineroutput/run path/best_model.pth --config_path multivoice/traineroutput/run path/config.json --speakers_file_path multivoice/speakers.pth --speaker_idx VCTK_speaker
OpenAI Whisper:
https://github.com/openai/whisper
Coqui TTS:
https://github.com/coqui-ai/TTS
Rnnoise:
https://github.com/xiph/rnnoise
YourTTS:
https://github.com/Edresson/YourTTS#reproducibility
YourTTS Recipe:
https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py