Exploring XTTS v1 and Tools to make Better Audio Datasets (the lazy way)
I look at Coqui's new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and shrink your dataset for faster and more reliable training.
XTTS Huggingface demo:
https://huggingface.co/coqui/XTTS-v1
Coqui TTS Github:
https://github.com/coqui-ai/TTS
Code snippets:
http://nanonomad.com/2023/09/21/a-look-at-xtts-v1-and-tools-for-comparing-audio-embeddings/
https://www.youtube.com/watch?v=AUln9N9dh9M&t=1m45
XTTS release announcement
https://www.youtube.com/watch?v=AUln9N9dh9M&2m20s
XTTS specs
https://www.youtube.com/watch?v=AUln9N9dh9M&2m40s
XTTS use
https://www.youtube.com/watch?v=AUln9N9dh9M&3m
Generating and listening to a test sentence
https://www.youtube.com/watch?v=AUln9N9dh9M&3m40s
Generation speed
https://www.youtube.com/watch?v=AUln9N9dh9M&4m30s
Showing inconsistencies with generated examples
https://www.youtube.com/watch?v=AUln9N9dh9M&6m20s
Thoughts on the results
https://www.youtube.com/watch?v=AUln9N9dh9M&7m15s
Update these packages if you have problems loading the XTTS model
https://www.youtube.com/watch?v=AUln9N9dh9M&7m30s
New model license
https://www.youtube.com/watch?v=AUln9N9dh9M&8m10s
Coqui Model License announcement
https://www.youtube.com/watch?v=AUln9N9dh9M&9m
XTTS and its outputs are for strict non-commercial use only
https://www.youtube.com/watch?v=AUln9N9dh9M&10m15s
My general workflow for audio clips
https://www.youtube.com/watch?v=AUln9N9dh9M&10m50s
Making segments with Audacity
https://www.youtube.com/watch?v=AUln9N9dh9M&11m10s
Using SpeechBrain's supported speaker verification model to compare audio clips
https://www.youtube.com/watch?v=AUln9N9dh9M&12m30s
Looking at the identified mismatching files
https://www.youtube.com/watch?v=AUln9N9dh9M&13m10s
Generating embeddings with Pyannote and comparing cosine similarity