Exploring XTTS v1 and Tools to make Better Audio Datasets (the lazy way)

Channel:

NanoNomad

Subscribers:

2,970

Published on September 21, 2023 3:37:16 AM ● Video Link: https://www.youtube.com/watch?v=AUln9N9dh9M

Duration: 14:44

3,143 views

I look at Coqui's new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and shrink your dataset for faster and more reliable training.

XTTS Huggingface demo:
https://huggingface.co/coqui/XTTS-v1

Coqui TTS Github:
https://github.com/coqui-ai/TTS

Code snippets:
http://nanonomad.com/2023/09/21/a-look-at-xtts-v1-and-tools-for-comparing-audio-embeddings/

https://www.youtube.com/watch?v=AUln9N9dh9M&t=1m45
XTTS release announcement

https://www.youtube.com/watch?v=AUln9N9dh9M&2m20s
XTTS specs

https://www.youtube.com/watch?v=AUln9N9dh9M&2m40s
XTTS use

https://www.youtube.com/watch?v=AUln9N9dh9M&3m
Generating and listening to a test sentence

https://www.youtube.com/watch?v=AUln9N9dh9M&3m40s
Generation speed

https://www.youtube.com/watch?v=AUln9N9dh9M&4m30s
Showing inconsistencies with generated examples

https://www.youtube.com/watch?v=AUln9N9dh9M&6m20s
Thoughts on the results

https://www.youtube.com/watch?v=AUln9N9dh9M&7m15s
Update these packages if you have problems loading the XTTS model

https://www.youtube.com/watch?v=AUln9N9dh9M&7m30s
New model license

https://www.youtube.com/watch?v=AUln9N9dh9M&8m10s
Coqui Model License announcement

https://www.youtube.com/watch?v=AUln9N9dh9M&9m
XTTS and its outputs are for strict non-commercial use only

https://www.youtube.com/watch?v=AUln9N9dh9M&10m15s
My general workflow for audio clips

https://www.youtube.com/watch?v=AUln9N9dh9M&10m50s
Making segments with Audacity

https://www.youtube.com/watch?v=AUln9N9dh9M&11m10s
Using SpeechBrain's supported speaker verification model to compare audio clips

https://www.youtube.com/watch?v=AUln9N9dh9M&12m30s
Looking at the identified mismatching files

https://www.youtube.com/watch?v=AUln9N9dh9M&13m10s
Generating embeddings with Pyannote and comparing cosine similarity

Other Videos By NanoNomad

2024-05-06	Training SDXL to Generate Text Using IA3 LoRA \| It's like Kai's Power Tools, I Guess?
2024-04-17	Replacing Faulty Asus Phoenix RTX 3060 GPU Cooler - It's Easy
2024-03-21	Bark TTS, Seamless Translation, RVC, Music Generation and More with the TTS Generation WebUI
2024-02-14	Train Better Stable Diffusion Models \| Prep Datasets Using this Free "Magic" Image Tool
2024-02-12	Emulate a Sound Blaster in real MS-DOS on Modern Hardware \| Retro Gaming on "Current" PCs
2024-01-28	How to Play Hundreds of Point-and-Click Adventures on iOS for FREE with ScummVM with NO SIDELOADING
2024-01-18	Training LoRAs and GLoRAs for Stable Diffusion 1.5 and XL Using the New Prodigy Optimizer
2024-01-03	Nick Rekieta - Role Model (Voice Parody. It's silly. It's a joke.)
2023-11-19	Automated Image Captioning with LLMs - Recognize Anything, BLIP-2, and Kosmos-2
2023-10-27	Fine-Tuning Mistral 7B using QLoRA and PEFT on Unstructured Scraped Text Data \| Making it Evil?
2023-09-20	Exploring XTTS v1 and Tools to make Better Audio Datasets (the lazy way)
2023-09-01	Es spricht Deutsch \| Tortoise TTS Speaking German Demo Clip \| Model Download Link in Description
2023-08-18	AI Null reads Alice's Adventures in Wonderland by Lewis Carroll \| Full Audiobook
2023-08-11	Remove Background Music and Enhance Speech with Free AI Tools \| Avoid ContentID
2023-08-06	AI Null Reads Alice's Adventures in Wonderland by Lewis Carroll, Chapters 1 and 2 \| joshcore
2023-07-30	Are Text Cleaners Making Your TTS Models Sound Bad? \| TTS Model Training Tips
2023-07-08	.:Demo:. Tortoise TTS Expressive Speech narrating Norman Arkawy's 1955 Sci-Fi short "Selling Point"
2023-07-03	.::Demo::. 4 Voice Multispeaker Tortoise TTS English Fine-Tuned Model Test :: Great Dictator Speech
2023-07-01	Creepy Message about a 2003 Pandemic in China on found IBM PS/1 Pentium 66mhz PC
2023-06-28	Now for Download: YourTTS (English, French, German, Spanish) Multilingual Model with 60+ Voices
2023-06-27	Demo: YourTTS speaking in native French; A sampling of trained-in Voices

Tags:

XTTS

text to speech

ai speech

voice cloning

audio embedding

pyannote

Channel	Latest
vanskadi gaming	6 hours ago
DSPGaming	6 hours ago
GameSavage	6 hours ago
Bengals444T	6 hours ago
WARRIOR FC	6 hours ago
Matthia Gryffine	6 hours ago
ZaverVT	6 hours ago
오늘의 코인뉴스	7 hours ago
Grzechu 40	7 hours ago
Phontomen LP	7 hours ago
Cadete_Nervoso Gamer	7 hours ago
PopStory	7 hours ago
Zeez Vov Gee VODS	7 hours ago
JETKOY	7 hours ago
TIMELINE	7 hours ago
kIT	7 hours ago
GamersGlobal	7 hours ago
Attempt Z	7 hours ago
crithon	7 hours ago
AppFind	7 hours ago
Strategic Battles 2020	7 hours ago
Jeux Vidéo Magazine	7 hours ago
Stephen Silverbeard	7 hours ago
コウジ【ゲームライブカンパミー】	7 hours ago
BIOFA Games Ap Sp PS418	7 hours ago