Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab

Channel:

NanoNomad

Subscribers:

2,970

Published on February 4, 2023 9:16:58 AM ● Video Link: https://www.youtube.com/watch?v=dfmlyXHQOwE

Duration: 11:40

6,771 views

170

This is about as close to automated as I can make things. I've put together a Colab notebook that uses a bunch of spaghetti code, rnnoise, OpenAI's Whisper Speech to Text, and Coqui Text to Speech to train a VITS model.

Upload audio files, split and process clips, denoise clips, transcribe clips with Whisper, then use that dataset to fine tune a VITS model. Colab script revised to add toggles for freezing layers and some (possibly broken) audio processing toggles

This is for fine tuning English voices; things are hardcoded for English. Adjusting this will take some work on your part, and fine tuning across languages is hit and miss.

First part of the video covers using Audacity and the VST3 port of rnnoise to more accurately clip samples on your PC. Second half is the Colab run-through.

Real time noise suppression plugin:
https://github.com/werman/noise-suppression-for-voice

Colab script (r4):

https://colab.research.google.com/drive/1Swo0GH_PjjAMqYYV6He9uFaq5TQsJ7ZH?usp=sharing

Audacity:
https://www.audacityteam.org/

Coqui's Dataset Guide:
https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset

rnnoise:
https://github.com/xiph/rnnoise

Other Videos By NanoNomad

2023-05-01	AI Voice Swap and Lip Sync using Wav2Lip-HQ-Updated
2023-04-22	Voice Cloning with Tortoise TTS and Model Training Using the AI Voice Cloning WebUI
2023-04-07	Locally Hosted Chatbots with RWKV through ChatRWKV and the Text-Generation-WebUI \| 14B Model on 3GB!
2023-03-29	Create Datasets for Voice Model Training on Google Colab \| Updated Tools for Coqui TTS Training
2023-03-22	Train a VITS Speech Model using Coqui TTS \| Updated Script and Audio Processing Tools
2023-03-15	Training or Fine Tuning a Hindi Language VITS TTS Voice Model with Coqui TTS on Google Colab
2023-03-05	Install and Configure Retroarch for PS Vita with Thumbnails, Overlays and Shaders
2023-03-03	Fallout 1 on the PS Vita is the Best Way to Play
2023-02-24	Train or Fine Tune VITS on (theoretically) Any Language \| Train Multi-Speaker Model \| Train YourTTS
2023-02-12	Even more Voice Cloning \| Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
2023-02-04	Updated \| Near-Automated Voice Cloning \| Whisper STT + Coqui TTS \| Fine Tune a VITS Model on Colab
2023-01-30	YourTTS Training Discussion \| Experiences, Multistage Training, Demos, Prior Training Preservation
2023-01-27	Updated \| Fine-Tuning YourTTS with Automated STT Datasets on Google Colab for AI Voice Cloning
2023-01-13	Fine-Tune YourTTS with Near-Automated Datasets on Google Colab for AI Voice Cloning
2022-12-22	Near-Automated Voice Cloning \| Whisper STT + Coqui TTS \| Fine Tune a VITS Model on Colab or Linux
2022-12-09	Dreambooth and Fine Tuning for Stable Diffusion 1.5 and 2 with this Versatile Script
2022-11-30	If Bill Gates could rap? AI Synthesized Voice, AI Upsampled Video \| Deltron 3030's Virus
2022-11-14	Training Stable Diffusion Dreambooth on Multiple Subjects for Combined Image Generation
2022-10-31	Locally Train Stable Diffusion with Dreambooth using WSL Ubuntu
2022-10-25	Animated Stable Diffusion and Synthesized Voice Demo with Facial Movements
2022-10-24	Stable Diffusion Image to Video, Synthesized Lauretta Young 1930s voice, Wav2Lip Demo

Tags:

voice cloning

vits

coqui

tts

ai voice

voice synthesis

Channel	Latest
LonePhoenix	6 hours ago
Terra Brasil	6 hours ago
Fortnite Emotes And Dances	6 hours ago
Turambar	6 hours ago
hugo abp yt	6 hours ago
Wumpa Games	6 hours ago
Shiny Aggron DX	7 hours ago
Scheiren	7 hours ago
IceMoonie	7 hours ago
Tenluiz Gameplay	7 hours ago
Shimbius	7 hours ago
Hex's Gameplay	7 hours ago
Super Saiyan Blue Lucario	7 hours ago
LastHeroesGameplays	7 hours ago
Bobbi-Lee Marijs	7 hours ago
TeaHee Plays	7 hours ago
Cadence Hero	7 hours ago
alexelcapo	7 hours ago
RealitySpin	7 hours ago
Joffker	7 hours ago
앙리형	7 hours ago
Gaming Is Life For This Platinum Trophy Hunter.	7 hours ago
LikableGoose65	7 hours ago
Podpah	7 hours ago
ДинзаShow	7 hours ago