Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset

Channel:
Subscribers:
2,340
Published on ● Video Link: https://www.youtube.com/watch?v=45DiA-aJwXI



Duration: 9:05
4,158 views
48


I've been looking at multispeaker VITS TTS models lately, so thought I'd share the Google Colab notebook. Its similar to the others posted, but this is using precomputed vectors; the configuration is similar to the YourTTS model, however this seems a little easier to fine tune.

As always, this stuff is experimental, but this should help you get started if you want to poke around at training a multi-speaker, English language VITS model using the Coqui TTS framework.

Multi-Speaker English language VITS training Colab Notebook:
https://colab.research.google.com/drive/1wAuG-TcZeAUYhff0f6ZiG-so9KT-sBIE?usp=sharing

YourTTS video discussing the same training options that can be used here as well:
https://www.youtube.com/watch?v=1yt2W-uK8mk

Real time noise suppression plugin:
https://github.com/werman/noise-suppression-for-voice

Audacity:
https://www.audacityteam.org/

Coqui's Dataset Guide:
https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset

rnnoise:
https://github.com/xiph/rnnoise

Download my multilingual, multispeaker YourTTS model on Huggingface: https://huggingface.co/AOLCDROM/YourTTS-Fr-En-De-Es
See allvoices.txt for information about each speaker:language training pair. Was trained on character sets, and uses 'artificial' language codes.

Generate text with the CLI:
tts --text "text" --out_path outfile.wav --model_path path/to/model_file.pth --config_path path/to/config.json --speakers_file_path speakers/index/path/speakers.pth --speaker_idx VCTK_speaker




Other Videos By NanoNomad


2023-05-04Make Using Tortoise TTS Faster with Fine-Tuned Models
2023-05-01AI Voice Swap and Lip Sync using Wav2Lip-HQ-Updated
2023-04-22Voice Cloning with Tortoise TTS and Model Training Using the AI Voice Cloning WebUI
2023-04-07Locally Hosted Chatbots with RWKV through ChatRWKV and the Text-Generation-WebUI | 14B Model on 3GB!
2023-03-29Create Datasets for Voice Model Training on Google Colab | Updated Tools for Coqui TTS Training
2023-03-22Train a VITS Speech Model using Coqui TTS | Updated Script and Audio Processing Tools
2023-03-15Training or Fine Tuning a Hindi Language VITS TTS Voice Model with Coqui TTS on Google Colab
2023-03-05Install and Configure Retroarch for PS Vita with Thumbnails, Overlays and Shaders
2023-03-03Fallout 1 on the PS Vita is the Best Way to Play
2023-02-24Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS
2023-02-12Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
2023-02-04Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab
2023-01-30YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation
2023-01-27Updated | Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning
2023-01-13Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
2022-12-22Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab or Linux
2022-12-09Dreambooth and Fine Tuning for Stable Diffusion 1.5 and 2 with this Versatile Script
2022-11-30If Bill Gates could rap? AI Synthesized Voice, AI Upsampled Video | Deltron 3030's Virus
2022-11-14Training Stable Diffusion Dreambooth on Multiple Subjects for Combined Image Generation
2022-10-31Locally Train Stable Diffusion with Dreambooth using WSL Ubuntu
2022-10-25Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements



Tags:
voice cloning
ai voice
tts
speech synthesis
vits
machine learning