Tortoise TTS DEMO: G-Man performs Gilbert and Sullivan's 'The Major-General's Song'

Channel:
Subscribers:
2,550
Published on ● Video Link: https://www.youtube.com/watch?v=9tBvNVVyeg4



Duration: 2:54
304 views
9


Half-Life's G-Man performs Gilbert and Sullivan's 'The Major-General's Song', through the magic of Tortoise TTS.

Generation settings: Sampler: 2, Iterations: 128, Cond free, length penalty 0.2

Download one of my fine-tuned Tortoise TTS models here:
Base model here: https://huggingface.co/AOLCDROM/Tortoise-TTS-MSFT-VCTK-4V-En
Requires custom tokenizer file, pt-t.json (put in ./models/tokenizers, switch tokenizer in settings menu)

Training a multi-voice English model. Testing one of the voices.
Training so far:
2 epochs to establish base language, 1e-5, text ratio 1
1 epoch of a single voice, 6 hour dataset, 1e-5, text ratio 0.1
several (I lost track) 1-2 epoch sessions of a 6 voice dataset ~2 hours,1e-5, text ratio 0.1
a total of 23 epochs thus far, 30 voice dataset, idk how long, approx 13,000 samples, multiple sessions, 1e-5, text ratio 0.1

Once an epoch ends, I allow another to begin. If the training reaches a minimum and stalls 1/3-1/2 through, I will terminate, test the model, and restart using that last checkpoint as the starting model.

Loss targeting across datasets/checkpoints doesn't seem to matter much right now, because the minimum are wildly different between sessions/models/datasets.

I typically see a relatively normal looking loss curve, then a sharp drop after each epoch during successful sessions, which feels a little counterintuitive.

Batching for all: 32, grad for all: 16

All training samples are recording-booth quality; very low noise unless the audio has effects applied

This voice worked well because it is dissimilar from the others being trained. There is some bleed over; I think the same voice actor performs one of the scientists, and the model slips into that intonation and speech pattern at one point.




Other Videos By NanoNomad


2023-06-27Demo: YourTTS speaking in native French; A sampling of trained-in Voices
2023-06-27Demo: YourTTS speaking in native Spanish; A sampling of trained-in Voices
2023-06-27Demo: YourTTS speaking in native German; A sampling of trained-in Voices
2023-06-27Demo: YourTTS speaking Norman Arkawy's 1955 Sci-Fi Story 'Selling Point'. Info in description.
2023-06-14Running 13B and 30B LLMs at Home with KoboldCPP, AutoGPTQ, LLaMA.CPP/GGML
2023-06-08Demo and Download: YourTTS Multi-accent, English/Spanish Multi-Voice Model 600k Checkpoint
2023-06-05DEMO: YourTTS - One Voice, Many Accents. A single speaker can generate multiple accents.
2023-06-04Revisiting YourTTS - Details about Training, Datasets, and experiences Voice Cloning with Coqui TTS
2023-06-03DEMO: YourTTS Multi-speaker VCTK Irish-accented Dataset after 275k Steps trained using Coqui TTS
2023-05-22Tortoise TTS Fine Tuning Wrap-Up
2023-05-16Tortoise TTS DEMO: G-Man performs Gilbert and Sullivan's 'The Major-General's Song'
2023-05-15Train Tortoise TTS in English, Spanish, French, Italian, Portuguese, German, and more? Maybe?
2023-05-10DEMO: Testing French-Speaking Tortoise TTS
2023-05-10DEMO: Testing German-Speaking Tortoise TTS
2023-05-08DEMO: Testing Spanish Speaking Tortoise TTS
2023-05-07DEMO: Testing Tortoise TTS Speaking in Portuguese
2023-05-04Make Using Tortoise TTS Faster with Fine-Tuned Models
2023-05-01AI Voice Swap and Lip Sync using Wav2Lip-HQ-Updated
2023-04-22Voice Cloning with Tortoise TTS and Model Training Using the AI Voice Cloning WebUI
2023-04-07Locally Hosted Chatbots with RWKV through ChatRWKV and the Text-Generation-WebUI | 14B Model on 3GB!
2023-03-29Create Datasets for Voice Model Training on Google Colab | Updated Tools for Coqui TTS Training



Tags:
AI Voice
Tortoise TTS
voice cloning