Updated | Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning

Channel:
Subscribers:
2,810
Published on ● Video Link: https://www.youtube.com/watch?v=58IqrrXMxQo



Category:
Vlog
Duration: 18:28
2,370 views
59


A followup to the YourTTS video. Here you can fine-tune a multispeaker YourTTS model using your own voice samples. The samples are split, converted, run through rnnoise to denoise, transcribed with OpenAI Whisper STT, then put into a VCTK-format dataset, and used to fine tune the YourTTS model using Coqui TTS.

The script is currently configured for English voices only. Other languages require separate datasets and things here are hardcoded for English for ease of use.

It probably mostly works if you have a good dataset. Probably. No promises.

Updated Whisper STT+Coqui YourTTS Colab Script:
https://colab.research.google.com/drive/1GsOL7pwCrECagRxmOgJoKOHtlaGhUCma?usp=sharing

WaveShop:
https://waveshop.sourceforge.net/download.html

Sonic Visualiser:
https://www.sonicvisualiser.org/

Coqui's Dataset Guide:
https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset

rnnoise:
https://github.com/xiph/rnnoise

Generate text with the CLI:
tts --text "text" --out_path outfile.wav --model_path multivoice/traineroutput/run path/best_model.pth --config_path multivoice/traineroutput/run path/config.json --speakers_file_path multivoice/speakers.pth --speaker_idx VCTK_speaker




Other Videos By NanoNomad


2023-04-07Locally Hosted Chatbots with RWKV through ChatRWKV and the Text-Generation-WebUI | 14B Model on 3GB!
2023-03-29Create Datasets for Voice Model Training on Google Colab | Updated Tools for Coqui TTS Training
2023-03-22Train a VITS Speech Model using Coqui TTS | Updated Script and Audio Processing Tools
2023-03-15Training or Fine Tuning a Hindi Language VITS TTS Voice Model with Coqui TTS on Google Colab
2023-03-05Install and Configure Retroarch for PS Vita with Thumbnails, Overlays and Shaders
2023-03-03Fallout 1 on the PS Vita is the Best Way to Play
2023-02-24Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS
2023-02-12Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
2023-02-04Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab
2023-01-30YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation
2023-01-27Updated | Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning
2023-01-13Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
2022-12-22Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab or Linux
2022-12-09Dreambooth and Fine Tuning for Stable Diffusion 1.5 and 2 with this Versatile Script
2022-11-30If Bill Gates could rap? AI Synthesized Voice, AI Upsampled Video | Deltron 3030's Virus
2022-11-14Training Stable Diffusion Dreambooth on Multiple Subjects for Combined Image Generation
2022-10-31Locally Train Stable Diffusion with Dreambooth using WSL Ubuntu
2022-10-25Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements
2022-10-24Stable Diffusion Image to Video, Synthesized Lauretta Young 1930s voice, Wav2Lip Demo
2022-10-16Animate Images using AI with Frame Interpolation for Large Motion
2022-10-14Animated Stable Diffusion Images using Google's FILM Frame Interpolation for Large Motion demo



Tags:
Voice Cloning
YourTTS
Coqui
Whisper
AI Voice