YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation

Channel:
Subscribers:
2,340
Published on ● Video Link: https://www.youtube.com/watch?v=1yt2W-uK8mk



Category:
Discussion
Duration: 11:02
1,205 views
34


Revised video with updated WHisper STT+Coqui YourTTS Google Colab script:
https://www.youtube.com/watch?v=58IqrrXMxQo

Notes for this video:
Adding original vectors for speaker retention:
https://youtu.be/1yt2W-uK8mk?t=111

Reinitializing the text encoder and duration predictor:
https://youtu.be/1yt2W-uK8mk?t=159

Freezing the text encoder and duration predictor:
https://youtu.be/1yt2W-uK8mk?t=190

Demo - Retraining TE on raw text:
https://youtu.be/1yt2W-uK8mk?t=223

Detach DP from TE:
https://youtu.be/1yt2W-uK8mk?t=328

Train TE using phonemes:
https://youtu.be/1yt2W-uK8mk?t=362

Demo - Retraining TE on phonemized text:
https://youtu.be/1yt2W-uK8mk?t=399

Finish baking:
https://youtu.be/1yt2W-uK8mk?t=455

Demo - Staged training results with espeak:
https://youtu.be/1yt2W-uK8mk?t=490

Talking about the sample sets:
https://youtu.be/1yt2W-uK8mk?t=574

Spelling reference:

D_VECTOR_FILES = [] # List of speaker embeddings/d-vectors to be used
during the training

D_VECTOR_FILES.append("/root/.local/share/tts/tts_models--multilingual--multi-dataset--your_tts/model_file.pth")

reinit_text_encoder=True,
reinit_DP=True,
freeze_PE=True,
freeze_flow_decoder=True,
freeze_waveform_decoder=False,
freeze_encoder=False,
freeze_DP=False,

VITS/YourTTS model documentation for attributes:
https://tts.readthedocs.io/en/latest/models/vits.html

Enable phonemes:
use_phonemes=True,
phonemizer="espeak",
phoneme_language="en",
add_blank=True,
text_cleaner="multilingual_cleaners",
phoneme_cache_path=os.path.join(dataset_conf.path, "phoneme_cache"),




Other Videos By NanoNomad


2023-04-22Voice Cloning with Tortoise TTS and Model Training Using the AI Voice Cloning WebUI
2023-04-07Locally Hosted Chatbots with RWKV through ChatRWKV and the Text-Generation-WebUI | 14B Model on 3GB!
2023-03-29Create Datasets for Voice Model Training on Google Colab | Updated Tools for Coqui TTS Training
2023-03-22Train a VITS Speech Model using Coqui TTS | Updated Script and Audio Processing Tools
2023-03-15Training or Fine Tuning a Hindi Language VITS TTS Voice Model with Coqui TTS on Google Colab
2023-03-05Install and Configure Retroarch for PS Vita with Thumbnails, Overlays and Shaders
2023-03-03Fallout 1 on the PS Vita is the Best Way to Play
2023-02-24Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS
2023-02-12Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
2023-02-04Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab
2023-01-30YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation
2023-01-27Updated | Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning
2023-01-13Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
2022-12-22Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab or Linux
2022-12-09Dreambooth and Fine Tuning for Stable Diffusion 1.5 and 2 with this Versatile Script
2022-11-30If Bill Gates could rap? AI Synthesized Voice, AI Upsampled Video | Deltron 3030's Virus
2022-11-14Training Stable Diffusion Dreambooth on Multiple Subjects for Combined Image Generation
2022-10-31Locally Train Stable Diffusion with Dreambooth using WSL Ubuntu
2022-10-25Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements
2022-10-24Stable Diffusion Image to Video, Synthesized Lauretta Young 1930s voice, Wav2Lip Demo
2022-10-16Animate Images using AI with Frame Interpolation for Large Motion



Tags:
YourTTS
Coqui
AI Voice
Artificial intelligence
TTS
machine learning
voice cloning