Exploring XTTS v1 and Tools to make Better Audio Datasets (the lazy way)

Channel:
Subscribers:
2,260
Published on ● Video Link: https://www.youtube.com/watch?v=AUln9N9dh9M



Duration: 14:44
3,143 views
62


I look at Coqui's new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and shrink your dataset for faster and more reliable training.

XTTS Huggingface demo:
https://huggingface.co/coqui/XTTS-v1

Coqui TTS Github:
https://github.com/coqui-ai/TTS

Code snippets:
http://nanonomad.com/2023/09/21/a-look-at-xtts-v1-and-tools-for-comparing-audio-embeddings/

https://www.youtube.com/watch?v=AUln9N9dh9M&t=1m45
XTTS release announcement


https://www.youtube.com/watch?v=AUln9N9dh9M&2m20s
XTTS specs

https://www.youtube.com/watch?v=AUln9N9dh9M&2m40s
XTTS use

https://www.youtube.com/watch?v=AUln9N9dh9M&3m
Generating and listening to a test sentence

https://www.youtube.com/watch?v=AUln9N9dh9M&3m40s
Generation speed

https://www.youtube.com/watch?v=AUln9N9dh9M&4m30s
Showing inconsistencies with generated examples

https://www.youtube.com/watch?v=AUln9N9dh9M&6m20s
Thoughts on the results


https://www.youtube.com/watch?v=AUln9N9dh9M&7m15s
Update these packages if you have problems loading the XTTS model

https://www.youtube.com/watch?v=AUln9N9dh9M&7m30s
New model license

https://www.youtube.com/watch?v=AUln9N9dh9M&8m10s
Coqui Model License announcement

https://www.youtube.com/watch?v=AUln9N9dh9M&9m
XTTS and its outputs are for strict non-commercial use only

https://www.youtube.com/watch?v=AUln9N9dh9M&10m15s
My general workflow for audio clips

https://www.youtube.com/watch?v=AUln9N9dh9M&10m50s
Making segments with Audacity

https://www.youtube.com/watch?v=AUln9N9dh9M&11m10s
Using SpeechBrain's supported speaker verification model to compare audio clips

https://www.youtube.com/watch?v=AUln9N9dh9M&12m30s
Looking at the identified mismatching files

https://www.youtube.com/watch?v=AUln9N9dh9M&13m10s
Generating embeddings with Pyannote and comparing cosine similarity




Other Videos By NanoNomad


2024-05-06Training SDXL to Generate Text Using IA3 LoRA | It's like Kai's Power Tools, I Guess?
2024-04-17Replacing Faulty Asus Phoenix RTX 3060 GPU Cooler - It's Easy
2024-03-21Bark TTS, Seamless Translation, RVC, Music Generation and More with the TTS Generation WebUI
2024-02-14Train Better Stable Diffusion Models | Prep Datasets Using this Free "Magic" Image Tool
2024-02-12Emulate a Sound Blaster in real MS-DOS on Modern Hardware | Retro Gaming on "Current" PCs
2024-01-28How to Play Hundreds of Point-and-Click Adventures on iOS for FREE with ScummVM with NO SIDELOADING
2024-01-18Training LoRAs and GLoRAs for Stable Diffusion 1.5 and XL Using the New Prodigy Optimizer
2024-01-03Nick Rekieta - Role Model (Voice Parody. It's silly. It's a joke.)
2023-11-19Automated Image Captioning with LLMs - Recognize Anything, BLIP-2, and Kosmos-2
2023-10-27Fine-Tuning Mistral 7B using QLoRA and PEFT on Unstructured Scraped Text Data | Making it Evil?
2023-09-20Exploring XTTS v1 and Tools to make Better Audio Datasets (the lazy way)
2023-09-01Es spricht Deutsch | Tortoise TTS Speaking German Demo Clip | Model Download Link in Description
2023-08-18AI Null reads Alice's Adventures in Wonderland by Lewis Carroll | Full Audiobook
2023-08-11Remove Background Music and Enhance Speech with Free AI Tools | Avoid ContentID
2023-08-06AI Null Reads Alice's Adventures in Wonderland by Lewis Carroll, Chapters 1 and 2 | joshcore
2023-07-30Are Text Cleaners Making Your TTS Models Sound Bad? | TTS Model Training Tips
2023-07-08.:Demo:. Tortoise TTS Expressive Speech narrating Norman Arkawy's 1955 Sci-Fi short "Selling Point"
2023-07-03.::Demo::. 4 Voice Multispeaker Tortoise TTS English Fine-Tuned Model Test :: Great Dictator Speech
2023-07-01Creepy Message about a 2003 Pandemic in China on found IBM PS/1 Pentium 66mhz PC
2023-06-28Now for Download: YourTTS (English, French, German, Spanish) Multilingual Model with 60+ Voices
2023-06-27Demo: YourTTS speaking in native French; A sampling of trained-in Voices



Tags:
XTTS
text to speech
ai speech
voice cloning
audio embedding
pyannote