Fine Tuning XTTS v2 for Hindi Speech with forked Coqui TTS

Channel:
Subscribers:
2,970
Published on ● Video Link: https://www.youtube.com/watch?v=zts2P5X211o



Duration: 8:05
6,926 views
0


I heard a few people say that the base XTTSv2 2.0.3 model doesn't produce very good Hindi output. Here I go through preparing some Hindi speech datasets using the Common Voice version 18 and Indic TTS Database datasets. I put together a handful of metadata conversion scripts, cleanup scripts, and batch files for this.

You can copy and paste them from here:
http://nanonomad.com/2024/07/03/xttsv2-hindi-finetuning/

The fine-tuned XTTSv2 checkpoints and speaker reference clips are here:
https://huggingface.co/AOLCDROM/XTTSv2-Hi_ft/tree/main

Note: The highest step count may not necessarily be the best quality output for whatever speaker you are trying to use. Some checkpoints may repeat the end of sentences more than others.

Previous XTTS v2 video:    • Fine Tuning XTTS v2 with forked Coqui...  




Other Videos By NanoNomad


2025-08-09Saturday Morning Console Wars: 40 Minutes of Restored Retro Console Commercials
2025-08-06MS-DOS and Windows XP Gaming on a Thinkpad X61 [SoundBlaster Emulation with MIDI in DOS]
2025-07-11Using Flux Kontext in Krita with the Generative AI Plugin
2025-04-17DiffRhythm: Generative Music (done quickly)
2025-02-25Is YuE the Stable Diffusion of Music? | Generate Full-Length Songs with Vocals at Home
2025-02-10Portable Whisper Speech to Text with Speaker Diarization and VAD | Purfview Faster Whisper XXL
2024-07-03Fine Tuning XTTS v2 for Hindi Speech with forked Coqui TTS
2024-06-26Fine Tuning XTTS v2 with forked Coqui | Coqui AI is dead; Long live Coqui!
2024-06-202x Faster LLM Training on Windows | LLaMA-Factory with Unsloth and Flash Attention 2
2024-06-1564kb Scene Demo/Intro/Cracktro Multimedia Mix #1 (90 min) | Flash/Photo-sensitivity Warning
2024-06-10Stable Audio Open 1.0 | Open Source* Generative Audio and Fine Tuning*
2024-06-04Troubleshooting Sega Saturn Emulation with Retroarch for iOS/Apple
2024-05-29Play Windows 98 and MS-DOS Games on iPad/iOS/iPhone with DOSBox-Pure and Retroarch for FREE
2024-05-25The Lost Art of Optical Disc Repair | Fixing and Testing a PlayStation Disc
2024-05-22Retroarch iOS Updates | Improved Performance, MS-DOS Core, Doom and Touch Input
2024-05-17RetroArch for iPad and iPhone now on the App Store | Installation, Setup, Quick Performance Overview
2024-05-13Micca Speck 4K Media Player | Unboxing, Firmware Update, Setup, Demos, and Opinions
2024-05-06Training SDXL to Generate Text Using IA3 LoRA | It's like Kai's Power Tools, I Guess?
2024-04-17Replacing Faulty Asus Phoenix RTX 3060 GPU Cooler - It's Easy
2024-03-21Bark TTS, Seamless Translation, RVC, Music Generation and More with the TTS Generation WebUI
2024-02-14Train Better Stable Diffusion Models | Prep Datasets Using this Free "Magic" Image Tool



Tags:
text to speech
tts
xtts
coqui
ai