YourTTS Training Discussion | Experiences, Multistage Training, Demos, Prior Training Preservation
Revised video with updated WHisper STT+Coqui YourTTS Google Colab script:
https://www.youtube.com/watch?v=58IqrrXMxQo
Notes for this video:
Adding original vectors for speaker retention:
https://youtu.be/1yt2W-uK8mk?t=111
Reinitializing the text encoder and duration predictor:
https://youtu.be/1yt2W-uK8mk?t=159
Freezing the text encoder and duration predictor:
https://youtu.be/1yt2W-uK8mk?t=190
Demo - Retraining TE on raw text:
https://youtu.be/1yt2W-uK8mk?t=223
Detach DP from TE:
https://youtu.be/1yt2W-uK8mk?t=328
Train TE using phonemes:
https://youtu.be/1yt2W-uK8mk?t=362
Demo - Retraining TE on phonemized text:
https://youtu.be/1yt2W-uK8mk?t=399
Finish baking:
https://youtu.be/1yt2W-uK8mk?t=455
Demo - Staged training results with espeak:
https://youtu.be/1yt2W-uK8mk?t=490
Talking about the sample sets:
https://youtu.be/1yt2W-uK8mk?t=574
Spelling reference:
D_VECTOR_FILES = [] # List of speaker embeddings/d-vectors to be used
during the training
D_VECTOR_FILES.append("/root/.local/share/tts/tts_models--multilingual--multi-dataset--your_tts/model_file.pth")
reinit_text_encoder=True,
reinit_DP=True,
freeze_PE=True,
freeze_flow_decoder=True,
freeze_waveform_decoder=False,
freeze_encoder=False,
freeze_DP=False,
VITS/YourTTS model documentation for attributes:
https://tts.readthedocs.io/en/latest/models/vits.html
Enable phonemes:
use_phonemes=True,
phonemizer="espeak",
phoneme_language="en",
add_blank=True,
text_cleaner="multilingual_cleaners",
phoneme_cache_path=os.path.join(dataset_conf.path, "phoneme_cache"),