Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning

Channel:
Subscribers:
2,810
Published on ● Video Link: https://www.youtube.com/watch?v=s1YtJUc_6VI



Duration: 9:47
2,587 views
49


***27/1/2023*** Script and video updated: https://www.youtube.com/watch?v=58IqrrXMxQo

A followup to the VITS video from a few weeks ago. Here you can fine-tune a multispeaker YourTTS model using your own voice samples. The samples are split, converted, run through rnnoise to denoise, transcribed with OpenAI Whisper STT, then put into a VCTK-format dataset, and used to fine tune the YourTTS model using Coqui TTS.

Notebook:
https://colab.research.google.com/drive/16Z2AeeGC4xAZlLWCCCQGfeWGlF_5Bj2E?usp=sharing

Python script:
https://pastebin.com/iRe3wjSL

Generate text with the CLI:
tts --text "text" --out_path outfile.wav --model_path multivoice/traineroutput/run path/best_model.pth --config_path multivoice/traineroutput/run path/config.json --speakers_file_path multivoice/speakers.pth --speaker_idx VCTK_speaker

OpenAI Whisper:
https://github.com/openai/whisper

Coqui TTS:
https://github.com/coqui-ai/TTS

Rnnoise:
https://github.com/xiph/rnnoise

YourTTS:
https://github.com/Edresson/YourTTS#reproducibility

YourTTS Recipe:
https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py




Other Videos By NanoNomad


2023-03-29Create Datasets for Voice Model Training on Google Colab | Updated Tools for Coqui TTS Training
2023-03-22Train a VITS Speech Model using Coqui TTS | Updated Script and Audio Processing Tools
2023-03-15Training or Fine Tuning a Hindi Language VITS TTS Voice Model with Coqui TTS on Google Colab
2023-03-05Install and Configure Retroarch for PS Vita with Thumbnails, Overlays and Shaders
2023-03-03Fallout 1 on the PS Vita is the Best Way to Play
2023-02-24Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS
2023-02-12Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
2023-02-04Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab
2023-01-30YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation
2023-01-27Updated | Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning
2023-01-13Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
2022-12-22Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab or Linux
2022-12-09Dreambooth and Fine Tuning for Stable Diffusion 1.5 and 2 with this Versatile Script
2022-11-30If Bill Gates could rap? AI Synthesized Voice, AI Upsampled Video | Deltron 3030's Virus
2022-11-14Training Stable Diffusion Dreambooth on Multiple Subjects for Combined Image Generation
2022-10-31Locally Train Stable Diffusion with Dreambooth using WSL Ubuntu
2022-10-25Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements
2022-10-24Stable Diffusion Image to Video, Synthesized Lauretta Young 1930s voice, Wav2Lip Demo
2022-10-16Animate Images using AI with Frame Interpolation for Large Motion
2022-10-14Animated Stable Diffusion Images using Google's FILM Frame Interpolation for Large Motion demo
2022-10-07Training Textual Inversion for Stable Diffusion | Customizable AI Image Generation



Tags:
AI
TTS
STT
voice cloning
coqui tts
whisper stt
YourTTS